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Preface 

Invention of solid-state transistors and integrated circuits has spawned the information 
age and the growth in the past 50 years has been phenomenal and unrivaled. Nowadays, 
information is at people's fingertips and communications take seconds rather than days like 
20 years ago. Such rapid development stems from tremendous developments in both 
hardware and software such as solid-state circuits. The field of integrated circuits has 
obeyed Moore's Law for 40 years but as materials are being pushed to the limit, scientists 
and engineers are finding it harder to continue on the trend predicted by Gordon Moore. 
Approaches such as parallel processing, new circuit design, and particularly novel materials 
are necessary. This book brings together contributions from experts in the fields to describe 
the current status of important topics in solid-state circuit technologies. It consists of 20 
chapters which are grouped under the following categories: general information, circuits 
and devices, materials, and characterization techniques. 

The first two categories consist of chapters about CMOS nonlinear signal processing 
circuits, transconductors, dynamically reconfigurable devices, new unified random access 
memory devices, low-voltage fully differential CMOS switched-capacitor amplifiers, low- 
voltage, high linear, tunable, and multi-band active RC filters, multi-clad single mode 
optical fibers for broadband optical networks, continuous-time analog filters for CMOS and 
VHF applications, CMOS low noise amplifiers, PCM performance, ESD protection elements, 
directional tuning control of wireless / contactless power pickup for inductive power 
transfer systems, regulated gate drivers in CMOS, millimeter-wave CMOS, CMOS 
integrated switched-mode transmitters, and metal-oxide-semiconductor memories and 
transistors. The chapters covering materials science and engineering include hafnium-based 
high-k gate dielectrics, liquid phase oxidation on InGaP and applications, as well as 
germanium-doped Czochralski silicon. The final two chapters pertain to miniature dual- 
axes confocal miscroscopy for real time in vivo imaging and scanning near-field Raman 
spectroscopic microscope. 

These chapters have been written by renowned experts in the respective fields making 
this book valuable to the integrated circuits and materials science communities. It is 
intended for a diverse readership including electrical engineers and material scientists in the 
industry and academic institutions. Readers will be able to familiarize themselves with the 
latest technologies in the various fields. In addition, each chapter is accompanied by an 
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extensive list of references for those who want to obtain more detailed information and 
perform more in-depth research. 

The tremendous cooperation from contributing authors who devoted their valuable 
time to write these excellent chapters and meticulous assistance provided by the editorial 
staff to make this book a reality are highly appreciated. 

Editor 
Paul K. Chu 

City University of Hong Kong 
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CMOS Nonlinear Signal Processing Circuits 

Hung, Yu-Cherng 

National Chin-Yi University of Technology 

Taiwan, R.O.C. 



1. Introduction 

In VLSI circuit design, nonlinear signals processing circuits such as minimum (MIN), 
maximum (MAX), median (MED), winner-take-all (WTA), loser-take-all (LTA), fc-WTA, and 
arbitrary rank-order extraction are useful functions (Lippmann, 1987; Lazzaro et al., 1989). 
In general, median filter is used to filtering impulse noise so as to suppress the impulsive 
distortions. The MAX and MIN circuits are important elements in fuzzy logic design. With 
regard to WTA application, it is the major function in pattern classification and artificial 
neural networks. Thus, design of these nonlinear signal-processing circuits to integrate 
smoothly within SoC (System-on-a-chip) applications becomes an important research. 
Recently, complementary metal-oxide-semiconductor (CMOS) technology is widely used to 
fabricate various chips. In this chapter, the designs of all circuits are realized by using 
CMOS process. However, since CMOS transistor is continuously scaled down via thinner 
gate oxides and reduced device size, supply voltage is necessary to reduce in order to 
improve device reliability. Therefore, a high reliable WTA/ LTA circuit, a simple MED 
circuit, and a low-voltage rank-order extractor are addressed in the chapter. The 
organization of this chapter is as follows. Section 1 introduces the background of these 
nonlinear functions, including definitions and applications. Section 2 describes conventional 
WTA/LTA architectures and presents a high reliable winner- take-all/ loser-take-all circuit. 
Section 3 shows an analog median circuit, with advantage of simple circuit. Section 4 
describes a CMOS circuit design for arbitrary rank order extraction. Restrictions and design 
techniques of low voltage CMOS circuit are also addressed. Section 5 will briefly conclude 
this chapter. 

Given a set of external input n variables ay ..., a n , the operation of MAX (or MIN) circuit 
determines the maximum (or minimum) value. A median filter puts out the median variable 
among a window of input samples. The function of a WTA network is to select and identify 
the largest variable from a specified set of variables. A counter part of WTA, LTA identifies 
the smallest input variable and inhibits remain ones. Instead of choosing only one winner, 
the A:- WTA network selects the largest k numbers among n competing variables ( k < n ), 
which allows for more flexibility in applications. For arbitrary rank order identification, a 
rank-order filter (extractor) is designed to select the fc-th largest element a/ t among n 
variables a\, ..., a„. Depending on application requirements, these input variables are either 
voltage, or current signals. 

In order to clearly describe these nonlinear functions, taking one example indicates these 
definitions. Two output responses of a circuit corresponding to a set of input currents Ij„i, 
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Ii„2, . . ,, and ImN '■ one is analog output current I , the other one is digital outputs set V i(rank), 
V„2(rank)/ •••/ an d ^oN(rank)- Assuming five external input currents are 9, 7, 10, 5, and 3 uA. 
Depending on various functions requirement, the output current I and the corresponding 
digital outputs responses are as follows. 

1. MAX: I = Maximum^,-,,!, I in2j ..., I;„n)= I« = 10 uA 

2. MIN: I„ = Minimum(I,„i, I;„2, . . ., I;„n)= Iin5 = 3 uA 

3. MED: I = Median(I,„i, lha, ■■■, Im)= h,a= 7 uA 

4. WTA: Output voltages Voi(rank), V o2(rank ), ..., and V 5(rank) respond to logic high to identify 
which one is the maximum value among I;„i, I,„2, ..., and ImN- In this case, (V i(rank> 
V 2(rank), •••, V 5(rank))= (0, 0, 1, 0, 0), where "0" and "1" are the logic low and logic high, 
respectively. 

5. LTA: A reverse operation of WTA function, and outputs set is (0, 0, 0, 0, 1) for this case. 

6. fc-WTA: Depending on k value, k winners are selected. The function has more flexible in 
application than WTA. For example, the outputs of 2- WTA is (Voi(rank), V 2(rank), •••/ 
V 5(rank))= (1, 0, 1, 0, 0) in this case. 

7. Rank order: The function of the rth rank-order extraction identifies the rth largest 
magnitude among I,„i, I,„2, . . ., and ImN- For example, outputs of the 2nd and 3rd rank 
order are (1, 0, 0, 0, 0) and (0, 1, 0, 0, 0) in this case, respectively. 



Rule 1 : IF x is PL andjy is ZR, then z is NS. 
Rule 2: IF x is ZR andy is NL, then z is ZR. 
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Fig. 1. Applications of MIN and MAX operations in fuzzy inference. 




Fig. 2. Application of MED filter. 
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Fig. 3. Two-dimension application of MED filter. 
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Fig. 4. Applications of WTA/LTA function in artificial neural network. 

Various applications for these nonlinear functions are described as follows. The MAX and 
MIN circuits are important elements in fuzzy logic design (Yamakawa, 1993). Fig. 1 shows 
the MAX and MIN operations in fuzzy inference. Variables "x" and "y" are inputs; variable 
"z" is the corresponding output response. In a specific status, either rule 1 or rule 2 is 
satisfied. MIN function realizes the "and" operation in fuzzy rules, and MAX function 
realizes the "or" operation. In image signal processing, MED function in general is used to 
filtering impulse noise so as to suppress the impulsive distortions. Figure 2 shows a one- 
dimension application for noise cancellation. Fig. 2(a) shows a V pp 1.2 V sinusoidal signal 
corrupted by noise, and Fig. 2(b) shows the processed signal after MED filtering with a 
window of size five. In addition, Figure 3 shows a two-dimension application also for noise 
cancellation of image. With regard to WTA application, it is the major function in pattern 
classification, vector quantization, data compression, and self-organization neural networks. 
Figure 4 shows WTA application for pattern identification. Commonly, an analogue rank 
order filter is widely used in signals sorting and classification. 

In general, these nonlinear functions are achieved either by using digital or analog 
implementations. Under digital implementation, since most of signals obtained from the real 
world are continuous forms, the continuous inputs must first be transferred to digital type 
by using one-or-multiple analog-to-digital converter (A/D). As a result, the circuit 
complexity, chip area, and power consumption are increased due to the extra data 
converters in digital realization. Whereas for analog implementation, the circuit accuracy is 
slightly lost than digital operation and there is weaker tolerance to fabricate process 
variation. However, without extra data transfer, the analog operation is with many 
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advantages such as saving time, bandwidth, and computation at the system level. 
Considering the practicality and flexibility, design issues of a CMOS analog signal 
processing circuit therefore must include 1) precision; 2) speed; 3) high tolerance to 
fabrication process variation; 4) wide range of supply voltage; 5) wide input range; 6) low 
circuit complexity; 7) low power consumption; 8) scalability; 9) programmability, and so 
forth, to allow these functions easily integration within various system-embedded chips. 
Additionally, when the device size of CMOS transistor is shrunk thinner and smaller, 
supply voltage is necessary to scale down in order to improve device reliability. A forecast 
of high-performance CMOS circuit operated within low voltage had been reported 
(Semiconductor Industry Association, 2008). Figure 5 shows the trend of CMOS supply 
voltage and physical gate length. Moreover, portable equipments such as biomedical 
electronics, computer, and portable telecommunication equipments are common used 
recently. Battery operation and low-power consumption are also important design 
requirements for these circuits. 



I Supply Voltage (V) — ♦— Physical Gate Length (nm) 




2007 2009 2011 2013 2015 2017 2019 2021 
Year 



Fig. 5. Trend for supply voltage and physical gate length by ITRS 2008 update. 



2. Winner-Take-All and Loser-Take-All circuit 

2.1 Architectures of WTA/LTA circuits 

Based on different circuit structures, conventional WTA/LTA circuits are roughly cataloged 
into four types: 1) global-inhibition structure, in which the connectivity increases linearly 
with the number of inputs (Lazzaro et al., 1989; Starzyk & Fang, 1993); 2) cell-based tree- 
topology (Smedley et al., 1995; Demosthenous et al., 1998); 3) excitatory/ inhibitory 
connection (He & Sanchez-Sinencio, 1993); and 4) serial cascade structure (Aksin, 2002). 
Figure 6(a-d) shows the conceptual diagrams of these topologies. In Fig. 6(a), each cell 
receives the same global inhibition, and a common current Iamn or voltage Vamn is shared by 
all the competing cells. The cells represented in a square block are nonlinear signal 
processing elements. Therefore, the precision of the circuit is degraded as the number of 
inputs increases. Since the operation of this circuit relies on the cells matching, a stable 
fabrication process is required for manufacturing a high-precision system. The complexity of 
the connectivity of the circuit is 0(N), where N is the number of inputs. Figure 6(b) shows a 
cell-based tree-topology, with N-l cells arranged in a tree topology for N inputs. Each cell 
receives two input variables to compare and outputs the larger (or smaller) of the two input 
signals. The backward digits in the bottom cell are then successive feedback to lst-layer cells 
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to identify the maximum (or minimum) input. The precision of this circuit is also sensitive to 
cell matching. With this circuit design, the device sizes must be rescaled when the supply 
voltage is modified. 
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Fig. 6. Conventional architectures, (a) Global-inhibition structure, (b) Cell-based tree 
topology, (c) Excitatory/ inhibitory connection, (d) Serial cascade. 

Figure 6(c) shows an excitatory/ inhibitory connection with an 0(N 2 ) connectivity 
complexity. Each cell receives the inhibited signals from other cells and an excitatory signal 
from itself. With this design, chip area increases with the square of the number of inputs. 
Based on comparators operation, Figure 6(d) shows an N-l analog comparison blocks and 
N-l digital blocks cascaded in serial. Within a comparison time T comp , the larger magnitude 
of inputs in each analog block is sent to next stage to compare with other inputs. The result 
of the each comparison is then sent to the corresponding digital block, and a decision digit is 
feedback from right block to left block to identify the maximum input. As a result, the 
response time of the circuit is approximated to (N-l)- T m + T di , where Tdig is the total 

propagation time of the digital part. The offset voltage of each comparator dominates the 
precision of the architecture. Circuit implementation of Fig. 6(d) is also sensitive to process 
variation. For a high precision application, identical internal circuit blocks shown in Figs. 
6(a-d) are necessary. The primary limitations of accuracy for the conventional architectures 
are fabricated process variations and matching requirement of internal cells. The variations 
of CMOS fabricated process include transistor threshold voltage, actual device size, thinness 
of the gate oxide, and other variety of factors. In a common process, threshold voltage in 
general varies from -10% to +10% of its nominal value. Due to the non-uniform etch and 
diffusion procedures, actual device sizes are also varied. In a real CMOS process, these 
variations are hard to eliminate completely. How can we improve the accuracy of analog 
circuit in a conventional process? 
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2.2 A high reliable WTA/LTA circuit 

In the section, a highly reliable CMOS signal processing circuit with a programmable 
capability for WTA function and LTA function is described (Hung & Liu, 2004). A symbol 
COMP t '(V inj ,V ink ) (ISij, k^N and N is the number of inputs) is defined such that the z'th 
comparator cell receives two input variables (Vinj and Vi„k) to compare in magnitude at time 
t, and the output Z\ of the cell is the larger variable or a binary value. For a 
COMP t '(V jn :,V ink ) operation, Z\ is defined as 



1 or 
or 



V; 



m]> 



V, 



in/\' ' 



when 
otherwise. 



V- ■ > V- t 



Therefore, returning to the conventional architecture the tree topology of Fig. 6(b), WTA 
mode, is represented as: 



h: COMP t \(V ml ,V m2 ) , COMP t \{V M ,V M ) COMP^^V m{N _ ly V mN ) 



h: COMP$" 2)+ \Z\ x ,Zl) , COMP^ /2) * 2 {Zl,Zt x ), 



f noeN >:COMP t (N -"(Z t N - 3 ,Z~- 2 ). 



After time 0( log 2 N ), the maximum (or the minimum) input variable is obtained. Total N-l 
identical comparators are necessary for this operation. 
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Fig. 7. A high reliable WTA/LTA architecture. 

To reduce the matching requirement of internal cell, Figure 7 shows a conceptual diagram of 
high reliable circuit. In the scheme, there are N identical 'digital' control cells and a single 
comparator for N input variables. A single comparator block multiplexes in time to achieve 
all inputs comparisons. The operating procedures are described as follows: 

h: COMP^V mU V m2 ) 



fa: COMPl 2 {Z\ v V ini ) 
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t IN n : COMP (Z 



,^)- 



The strategy adopted to find the maximum/ minimum among a set of variables is that two 
variables are first compared; then the result of this comparison is compared with the next 
input variable using the same comparator. The procedure continues until the comparisons 
of all input variables are completed. Conceptually, circuit operation is similar to a serial 
comparison. Unlike the traditional architectures that require N-l analogue comparators; this 
architecture requires only a single comparator to eliminate sensitivity to component 
matching requirements. Using the same algorithm, the LTA function is easily obtained by 
only reversing the output state Z' t in the same architecture. 



Comparison Block 




JAuto-zeroj Comparisori 

^ 1 unit r 

Fig. 8. Comparison block and control signals. 

The key block in this architecture is the comparator cell. Comparator performance is a 
crucial factor for realizing high-speed data conversion systems and telecommunication 
interfaces. The precision of a comparator is usually defined as the minimum identifiable 
differential voltage (or current) between inputs, that is, the comparator's resolution 
capability. A comparator design from (Hosotani et al., 1990) is used herein; the schematic 
diagram is shown in Fig. 8. Transistors M sw i, M SW 2, M SW 3 are used as switches. The circuit 
operates on two phases, auto-zero phase and comparison phase. Assuming the voltage at 
node B is V x . Based on charge conservation, after the comparison phase, V x arrives at the 
following: 



V x =V b+ (V m2 -V lnl )- 



c p +c m 



(1) 
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The effect of the C S /(C S +C +C jn ) term in (1) represents a degrading factor. To reduce the 

decision time, the succeeding inverters amplify the different voltage (V;„2 - Vm) to pull node 
D up to high (logic 1) or push it down to V (logic 0). The functions of the N-latch are to 
sample the voltage at node D as latch_clk turns high and to hold the comparison result as 
latchjzlk turns low. Ultimately, the output polarity of the N-latch will be changed according 
to the max/min selector setting. The max/min selector signal modifies the polarity of the 
compared result; therefore, without the need for structural modification, this circuit 
possesses win/ lose configurable capability. The comparison block shown in Fig. 8 is reused 
during all comparison procedures. The architecture of N-inputs circuit is shown in Fig. 9, in 
which Control_Cell„ (K n s=N) are identical. N cells are required for N input variables. 
Each cell contains a status block, a control_switch block, and two latch blocks. 
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Fig. 9. The block diagram of the high reliable WTA/LTA. 

Figure 10 shows the clocks for the whole circuit. Signal reset and clock reg_clk must be 
generated externally; other clocks are produced by reg_clk and some logic gates. 
To describe the operations of the entire circuit, the circuit architecture in Fig. 9 and the clock 
waveform in Fig. 10 are referred. First, at fl, reset signal is used to initiate the status blocks, 
control_switch blocks and latch blocks. The N-latch in the status block and R i, R 2, ■■■, Ron 
are reset to zero by reset signal. Based on max/min selector signal, the MOS transistors Msl, 
Ms2, Ms3 and Ms4 preset the initial sampling voltage (0 V or Vdd) at node cap_comn. 
Despite the magnitude of input-1 variable, the input-1 variable must be a winner during an 
initial interval for a serial comparison. The initial sampling voltage at node cap_comn is thus 
set as V when the max/min selector signal is set to logic 1 for WTA operation, and vice 
versa. 
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Fig. 10. Clock waveforms. 

Then, at tl, the V s i clock turns high (auto-zero phase) to sample the initial voltage (0 V or 
Vdd) at node cap_comn. Next, at f3, R i turns high to sample voltage V; H i. At this time, the 
clock V s i turns low (comparison phase) to compare the Vm with the initial sampling voltage, 
and the compared result is stored in the N-latch of the first status block. The state of the N- 
latch is logic 1 if the variable is the winner. At tA, the present winner V,„i is sampled again. 
At f5, a new comparison between previous winner V,„i and V,„2 is performed. At f6, the 
winner (the result for the Vm and V,„2 comparison) is sampled again. After this procedure, a 
new comparison between the present winner and V, n 3 is performed. The procedure 
continues until comparison of all the input voltages is completed. Ultimately, only one state 
Vosn (n-1, ..., N) in these cells is logic 1 for WTA/LTA indication; others are logic 0. 
Therefore, a WTA or a LTA operation has been accomplished. 

Figure 11 shows the status block. Figure 12 shows the control_switch block. It receives an 
input variable and controls the transmission gate to sample input level. A true single-phase 
latch composed of an N-latch and a P-latch is used to reduce the clock skew issue (Yuan & 
Stensson, 1989). 
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2.3 Simulation results and reliability test 

With regard to the high reliable WTA/LTA circuit, an experimental chip with six inputs was 
also fabricated using a 0.5-um CMOS technology. The sampling capacitance C s implemented 
by using two-layer polysilicon is set to be 3 pF. The period of reg_clk clock is 100 ns with a 
50% duty cycle. WTA/LTA functions, supply-voltage range, and Monte Carlo analysis of 
transistor variation by simulation were also tested. 
1) WTA/LTA functions 

To test the function of the circuit, each example takes ten input voltages for the WTA/LTA 
operation. For supply voltage Vdd=3.3 V, the input variables Vmh Vita, ■■■, and V,„io are 
0.003, 0.006, 1.000, 0.997, 2.000, 2.003, 2.000, 3.297, 3.300, and 3.297 V for testing WTA 
function, respectively, and 3.297, 3.294, 2.000, 1.997, 2.000, 1.000, 0.997, 0.006, 0.009, and 0.003 
V for testing LTA function. During the WTA operation, the logic state V os „ of each cell at 
each time slice becomes: 



V os i= 1,0,0,0,0,0,0,0,0,0 
V os5 = 0,0,0,0,1,0,0,0,0,0 
V os9 = 0,0,0,0,0,0,0,0,1,1 



v os2 = 
v os6 = 



0,1,0,0,0,0,0,0,0,0 
0,0,0,0,0,1,1,0,0,0 
0,0,0,0,0,0,0,0,0,0. 



V os3 = 0,0,1,1,0,0,0,0,0,0 V osi = 0,0,0,0,0,0,0,0,0,0 
V os7 = 0,0,0,0,0,0,0,0,0,0 V os8 = 0,0,0,0,0,0,0,1,0,0 



When all comparisons are finished, the outputs V os i, V 0S 2, V 0S 3, ■■•, and V„sio respond as logic 0, 
0, 0, 0, 0, 0, 0, 0, 1, and 0, respectively. Therefore, among these ten inputs, input variable V,„9 
is the maximum. Figure 13 shows the results of HSPICE simulation for the WTA operation. 
The time period of the latch clock (top trace) is 100 ns. In the same operation, Fig. 14 shows 
the results for the LTA operation. The final outputs V os i, V 0S 2, Vos3, . . ., and Vaao are logic 0, 0, 
0, 0, 0, 0, 0, 0, 0, and 1, respectively, and the input variable Vmio is the minimum one. Choice 
for the above tested voltages was based on the followings: 1) input voltages of neighbor cells 
should be as close as possible to test discrimination capabilities; 2) input voltages are 
distributed from V to 3.3 V to test for wide dynamic range. 
2) Supply voltage range 

All circuit parameters such as transistor dimensions, clock periods and sampling 
capacitance C s are held constant. A supply voltage Vdd varies from 2 V to 5 V, and the logic 
high of these clocks are also modified when the supply voltage alters. The supply voltage 
Vdd for each iteration increases in 0.1 V steps. The simulation results show that the circuit 
operates successfully within 3-mV discrimination when the supply voltage ranges from 2.7 
V to 5 V. Without any procedure for rescaling the device size, the circuit works under 
various commonly used supply voltages. 
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Fig. 13. Simulation results of the WTA operation. 
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Fig. 14. Simulation results of the LTA operation. 

3) Process variations 

A statistical distribution of manufacturing parameters often occurs during CMOS 
fabrication. Wafer-to-wafer, run-to-run and transistor-to-transistor process variations 
determine the electrical yield and critical second-order effects. Threshold voltage, channel 
widths, and channel lengths of all MOS transistors were set to nominal values with ±5 % 
variation at the 3 sigma level, and each transistor was given an independent random 
Gaussian distribution. After 30 Monte Carlo iterations, HSPICE results indicate that circuit 
precision and speed are not degraded over this range. In addition, to verify the circuit with 



12 Advances in Solid State Circuits Technologies 

multi-technology support capability, using various CMOS fabrication parameters also 
simulates the circuit performance. The results show that the performance of the circuit 
under various fabrication processes is functional work, without needing to tune any device 
dimension. The following reasons contribute to the robustness of this circuit: 1) the circuit is 
designed with only a single analog cell (comparator), while the other active components are 
digital; 2) the comparator itself is designed with a auto-zero property, therefore, the 
operation of the comparator is more tolerant to manufacturing process variation. 
4) Circuit precision 

The accuracy of the comparator cell dominates the identified precision. The comparator 
accuracy is dependent on two factors. One is the clock feed-through error and charge- 
injection error in transistor M SW 3, shown in Fig. 8; the other is the degrading factor in Eq. (1). 
Charge-injection error is a complicated function of substrate doping concentration, load 
capacitor, input level, clock voltage, clock falling rate, MOS channel dimension, and the 
threshold voltage. Therefore, this error is difficult to be completely eliminated. In general, 
complementary clock, transmission gates, and dummy transistor are adopted for a switch 
realization to reduce the error. 

3. CMOS analogue median cell 

Median (MED) filter is a useful function in image processing application to eliminate pulse 
noise. Given a set of external input n variables a\, ..., a„, the operation of MED circuit 
determines the median value. The extracted median operation is a nonlinear function. The 
MED circuit realizations can be classified as analog filtering and digital filtering depending 
upon what type of input signals are. The digital filtering architecture has a variety of 
sophisticated algorithms to support the circuit realization so as with advantages of higher 
flexible and higher reliability. For power consumption and chip area considerations, 
however, it is costly expensive than analog architecture. In 1994, without using an 
operational amplifier, an analogue median extractor with simple structure and high sharp 
DC transfer characteristic was presented (Opris & Kovacs, 1994). The circuit expects to 
reduce the errors in the transition region. In 1997, for the same authors, an improved version 
with high speed operation was proposed. The median circuit has transient recovery less 
than 200 ns by using 2-um CMOS process (Opris & Kovacs, 1997). In 1999, a current-input 
analog median filter composed of absolute value and minimum circuits was proposed 
(Vlassis & Siskos, 1999). The operational amplifier and transconductor are also not needed in 
design of the circuit. Based on transconductance comparators and analog delay elements, a 
fully continuous-time analog median filter is presented in 2004 (Diaz-Sanchez et al., 2004). 
By using the median filter cells, an image of 91x80 pixels can be processed in less than 8 |is 
to remove salt and pepper noise. In the section, an intuitional and simple CMOS analog 
median cell is described (Hung et al., 2007). Based on current-mirror, current comparison, 
and some basic digital logics, a simple analog median filter cell is achieved. By using TSMC 
0.35 urn CMOS technology, simulation shows that the median filter provides a 0.4-uA 
discriminability and well tracked the median value among input currents. 
Figure 15 shows a basic one-input current cell composed of current mirror and control logic 
circuits. The cell has one signal input (i s ), a current source (i SS rc) output and a current sink 
(is_sink) output, a control signal V c tr-, and an output current (i ut). Transistors M1-M12 are 
cascode current mirrors. M swp and M swn constitute transmission gate for analog switch 
function. Mdummy is designed to compensate the M SW n and M swp loading to improve the 
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accuracy of output current. Mi SO is used to isolate the clock noise from transmission gate. 
Mdisi-2 and M res are used to speedup transmission operation and control the discharge 
timing. Corresponding to Fig. 15(a), Fig. 15(b) is a symbol representation, which is named as 
current signal control unit and is abbreviated as CSCU. 




'.v.v/nft 

L 



Current 
Signal 
Control 

Unit 
(CSCU) 



hut 



(b) 



Fig. 15. Current signal control unit (CSCU): (a) circuit and (b) symbol representation. 

Three input signals i s \, i S 2, and i&, how can circuit extract the median value? Assuming i s i is a 
median current. The criteria must be satisfied. 



MED(! s i, i s2 , i S 3)= i S 2= 



(( ; s 2 >! 's3) and ( ; s2< ! ' s i)) 



(( I ' s 2 <; s 3) and ( ! ' s 2> ! ' s l)) 



(2) 



As a result, current level comparison and logic decision are required to realize the function. 
Figure 16 shows a three-input median circuit composed of three CSCU cells and three decision 
logic blocks. The decision logic circuit is simply realized by AND-OR gate circuit to perform 

^ = ©-© + ®-® (3) 

where (T), (2), (3), and (4) represent the corresponding the logic inputs, that is, these 
signals come from comparison results (A)-(F) signals. Depending on the output status of 
each decision logic, Eq. (3) determines V c tr a low level or a high level, respectively. A low Vctr 
will turn on the transmission gate of corresponding CSCU cell to switch on the input 
current; otherwise, the input current is prohibited. As a result, three-input MED filter cell is 
successfully arrived. Due to the transition pulse noise, a capacitor Cfin er is used to suppress 
the switch noise. 

In the circuit, NMOS transistor size (W/L)n=5u/1u. and PMOS transistor size 
(W/L)p=10u/lu are used for M1-M12. The sizes of inverters are (W/L)n=5u/0.35u and 
(W/L)p=20u/0.35u. The device site of switch transistors M swn and M swp are equal to (W/L)n- 
p=20u/0.35u. All transistors in decision logic block are sizing (W/L)n=5u/0.35u. and 
(W/L)p=10u/0.35u. The filter capacitance Cfiitcr is designed as 10 pF. The supply voltage Vdd 
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is commonly used as 3.3 V. Input current signals i s i, i&, i& have 10 uA peak value at different 
5 u.s, 10 us, and 15 us time slot, respectively. Figure 17 shows three triangle waves and the 
corresponding median output. The red line represents the MED output. The output is 
tracked well with the median value of the three inputs current. By observing Fig. 17, when 
two input values are closed to each other, the minimum difference must be larger than 0.4 
uA. That is the discriminability of the MED filter. However, there are some little spike 
occurs in the transition point. 




Fig. 16. Three-input median cell. 




Fig. 17. The output response of the median filter for triangle waveforms. 

Inspecting Fig. 16, the proposed three-input median cell has three input pins (i s i, i S 2, and i S 3) 
and a common output pin (i oul ). By modifying the switch transistors and decision logic, the 
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MED cell can be easily modified as three inputs and three outputs. The modified MED cell 
will have maximum value imaxmin, median value /'median/ and minimum value i'mmmum outputs, 
simultaneously. As a result, the multiple modified MED cells can be organized cooperation 
to perform the 'sorting' function. In the design, no critical components such as operational 
amplifier and precise voltage reference are required in the MED cell. These properties are 
useful for the MED cell simply embedded into a larger system. 

4. Low-voltage arbitrary rank order extraction 

4.1 Principle of rank-order extraction 

Ether WTA, LTA, or MED function, however, is only a single order operation. In 2002, a 
low-voltage rank-order filter with compact structure was designed (Cilingiroglu & Dake, 
2002). The filter is based on a pair of multiple- winners-take-all and a set of logic gates. In the 
section, a new architecture for with both arbitrary rank-order extraction and A:-WTA 
functionalities is described (Hung & Liu, 2002). An rth rank-order extraction is defined that 
identifies the rth largest magnitude of input variables. In the design, the circuit locates an 
arbitrary rank order among a set of input voltages by setting different binary signals. A set 
of output voltages V _i, V _i, ■■■, and V _m corresponds to the output voltages of a rank-order 
extractor for inputting of a set of variables V\, V% ..., and Vm- The output status D,y of a 
comparator with two-input terminals is defined as 

D, = <^ , ; 1< i, j < M , i * i (4) 

'> [0 otherwise ' ' w 

where M is the number of the input variables. For convenience of description, a temporal 
index S, defines the total number of winners for the fth input variable compared with the 
others. Thus, S, is represented as 

M 

(5) 



(6a) 
(6b) 

(6c) 

(6d) 

Thus, from the left-hand side of (6), M(M-l) comparators' cooperation is required for M 
input variables to identify the rank order. Since Dp is the complementary of D,y ( D ; ,= D- )/ 
the expression is replaced by D in the right-hand side of (6). The physical meaning is that if 
both the output of the comparator and its complementary are given, the total number of 
comparators can be reduced from M(M-l) to M(M-l)/2. 
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In this section, the comparator generates a unit current I un it when input variable V, is larger 
than Vj. Thus, the index S; in (5) is rewritten as 



S* = X Dyhmt , 1< i < M = nl mit , 0< n < (M - 1) (7) 

7=1, 7*i 



where n is the number of the winner in comparison. If the inputs are arranged in ascending 
order of magnitude, Vi, V% ..., Vu, which satisfy Vi<V2< ... <Vm, then 
Sj = 0, S 2 =l unit , ..., S M = (M -l)I UTOt - Obviously, the minimum, next minimum, ..., 
maximum input variables can be found by checking the index S ( . The A:-WTA function is 
defined so that the outputs must be logic high when 

S;>(M-k)I unii . (8) 

For example, if the input variables are (0.5, 0.6, 0.9, 0.2, 0.4), the first variable 0.5 is larger 
than variables 0.2 and 0.4. Thus, the index S l is 2l un if, the meaning is that the variable wins 
two other input variables among all comparisons. For the same reason, the S 2 = 3I„„, f , 
S 3 = 4I unit , S 4 = , S 5 = I unit . Therefore, the rank order is found among the input variables 
by checking the index S t . In this example, the output voltages (Vo_i, V _2, ■••, V _5) of the 
extractor respond to be (0, 0, 1, 0, 0), (0, 1, 0, 0, 0), (1, 0, 0, 0, 0), (0, 0, 0, 1, 0) for the maximum 
operation, next maximum operation, median operation, and the minimum operation, 
respectively. The "0" and "1" are the logic low and high. Similarity, if the extractor is 
configured as fc-WTA function, the output voltages (V _i, Vo_2, •••, V _s) of the circuit respond 
to be (1, 1, 1, 1, 1), (1, 1, 1, 0, 1), (1, 1, 1, 0, 0), ..., and (0, 0, 1, 0, 0) for 5-WTA, 4-WTA, 3-WTA, 
. . ., and 1-WTA operations, respectively. 

4.2 Architecture of rank-order extraction 

The structure of the extractor is shown in Fig. 18 for five input variables (Hung & Liu, 2002). 
There are a total of M(M - l)/2 comparators and M evaluation cells for M input variables. 
Each comparator cell accepts two input signals, and the results of each comparison are fed 
into the individual evaluation cell. In the first row of Fig. 18, the input Vy is compared with 
other input variables. In addition, the results of the comparison will generate the proper unit 
currents l un n- Then, these currents will be summed up in Eval-1 cell if Vi is larger than the 
other samples; otherwise, the result of the comparison will be fed into the corresponding 
evaluation cell. The connecting strategy is the same for other input variables. Therefore, 
equation (7) have been realized in this architecture. 

The signal Vchoice in Fig. 18 is used to decide the function of the circuit. Vdioice is preset at logic 
high to allow the rank-order operation; otherwise, the A:-WTA function is enabled. The 
binary signals sel_l, sel_2, and sel_3 are used to determine which rank-order/ fc-WT A will be 
located. Based on the select signals (se/_l-3) setting, the logic states of the evaluating cells 
indicate which input variable belongs to this rank order. For example, in the seven inputs 
rank-order operation, the (se/_l, se/_2, se/_3) signals are set to logic (0, 0, 0) to find the 
minimum variable; the logic (0, 1, 1) and (1, 1, 0) setting are the median and maximum 
functions, respectively. Similarity, in the fc-WTA operation, the (se/_l, sel_2, sel_3) is set as (0, 
0, 1) and (1, 1, 0); therefore, the 6-WTA and 1-WTA are obtained, respectively. 
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Fig. 18. The architecture of arbitrary rank-order extractor for five input variables. 

4.3 Circuit design 
4.3.1 1 .2-V comparator 

Comparator is a key element in Fig. 18. An auto-zero comparator shown in Fig. 19 is 
designed to operate at low voltage supply. To improve the speed of the comparator, the 
succeeding gain stage is designed to operate in dynamic mode. First, in the auto-zero phase, 
the input Vi is sampled at the top plate of the capacitor C s , and the MOS transistor Mil is 
biased at Vuas voltage. In next phase, the voltage at node E is Vk»s + (V2-Vi)(C s / C s +Cp) during 
the comparison phase. Then, a deviation voltage is amplified by transistors Mil and M12. 
To reduce the power dissipation, the adjustable biasing voltage Vuas is chosen simply to 
overcome the threshold voltage of a MOS transistor, and the biasing voltage is also adjusted 
for the comparator operation in different voltage supplies. The succeeding transistors M13 
and M14 provide the current to generate the proper voltage at node F. Depending on which 
input voltage is larger, either the voltage at node H or node G will be at logic high. The 
output node G of the comparator and its complementary node H are fed into next stage to 
generate unit currents Ii„rgc_i, harge_2, hmaii_i, and I sm aii_2- During the evaluation phase, the unit 
currents Ii arge j and lurgeji will be presented when Vi is larger than V 2 . Otherwise, the l sm aii_\, 
hmaii_2^re generated. The symbol representation of the comparator cell is shown in the right- 
bottom of Fig. 19. 
The function of the comparator shown in Fig. 19 is summarized as 



V 1 > V 2 . 



/ arg e_l / arg e _ 2 unit 
hmall 1 = hmall 2 = ® 



18 



Advances in Solid State Circuits Technologies 



V 1 <V 2 , 



I arg e _ 1 / arg e _ 2 ' 

small 1 — small 2 ~ unit 



where I,-, is the unit current of the PMOS transistor Mb 
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Fig. 19. 1.2-V auto-zero comparator, clock, and symbol representation. 
4.3.2 Evaluation cell 
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Fig. 20. Evaluation cell. 
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The circuit of the evaluation cell is shown in Fig. 20. The MOS transistors M ge n and M un i t 
reproduce the same unit current. The unit current is equal to the Ii ar ge_h hargejj IsrmUj, and 
hmaii_2 in Fig. 19. In order to find the various rank orders for all input signals, the cell must 
identify that the unit-current summation in (7) comes from Out_coml and Out_com2 
terminals. It is not easy to identify the exact current value in the VLSI circuit. However, 
whether the summation current S, lies inside a valid range or not can be checked by the 
criterion, 

nl mit - <?i < S'i< nl mit + S 2 . (9) 

It is a reasonable and safe design to choose 8^=82= I unit / 2 . Therefore, the dimensions of 
these MOS transistors are designed as 

( j )m, -( , )m 5 ~ 4 ( j )M mU > ( , )m 2 -( , >M 6 _2 ( , >M mii 

WWW W W 1 W 

(~Y~>M i = \~y~)m 1 = (~Y~>M, mil ' \~~r~ )m 4 = (~r~ )m, = "r \~j~)M u „ if 

where W is a channel width and L is a channel length. MOS transistors M a ddi and M4 realize 
the S 2 effect, and the M8 realizes the -<?, one. Depending on the sel_l-3 signals setting, the 
transistors M cn t_i-6 enable the corresponding binary-weight current. The inverters inv4c-7 
support sufficient gain to amplify the current difference between the currents which come 
from Out_coml-2 terminals and the binary-weight currents. This mechanism is similar to a 
current comparator. In the upper row of Fig. 20, the extra PMOS transistor M a ddi generates 
an extra unit current; therefore, the voltage V ut-h is always larger or equal to V ou t-i- If the 
V 'choke is preset to 0, the dash block in Fig. 20 resets the V ou t-i to 0. Then the effect of lower row 
in Fig. 20 is disabled. At this time, the function of the cell resembles performing only the 

S*i<nl unit + S 2 . (10) 

Thus, this is a fc-WTA criterion. 

Take an example to describe the function of the evaluation cell. The number of input 
variables is seven, and the sel_l-3 signals are set as (0, 0, 1) to find the next minimum input 
variable. Since the next minimum is only larger than the minimum one, only a single unit 
current comes from Out_coml-2 terminals of the corresponding evaluation cell. In the upper 
row of Fig. 20, the summation of one unit current and the extra unit current (M a ddl) is larger 
than binary weight current 1.5I u „a; therefore, V ou tji is logic 1. In contrast with the upper row, 
in the lower row the unit current I un it (which comes from Out_coml-2 terminals) is smaller 
than the binary weight current 1.5I m it', therefore, V ou tj is logic 0. Thus, the transistors Midi 
and Mid2 only allow the situation (V ou t_h, V ou tj) = (1, 0) to pull up the corresponding output 
(V _n, n-1, ..., 7) to logic 1. Otherwise, the status of V on will be logic or open state for other 
cases. Therefore, by inspecting the logic state of V _ n , it is found which input variable 
belongs to this desired rank order. 
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4.4 Measured results and design consideration 

A seven-input experimental chip was fabricated using a 0.5 urn CMOS technology. Bias 
voltage Vttas is set to 0.9 V in this design. The sampling capacitor C s is 0.8 pF, and these 
analog switches in this circuit are implemented by CMOS transmission gates. The 
micrograph of the experimental chip is shown in Fig. 21, and the active area is 610 x 780 
urn 2 . An individual comparator cell was built in this chip for measuring the accuracy. The 
supply voltages of the core circuit and the input/ output pads were all set as 1.2 V. The 
accuracy of the individual comparator was measured roughly as 40 mV, that is, the 
resolution of the comparator was near five bits under a 1.2 V supply voltage. Figure 22(a) 




Fig. 21. Micrograph of the 1.2-V rank-order chip. 




50,000 us 
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Fig. 22. The measurement results of (a) rank-order (b) /c-WTA operations. 
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shows the rank-order function, whereas Fig. 22(b) shows the function of the A:-WTA. On the 
average, the accuracy of whole circuit was approximated 150 mV. The performance of the 
chip was degraded by many factors such as the mismatch in comparator cells, the different 
capacitance at input terminals of the evaluation cells, and the clock feed-through error. Due 
to these non-ideal effects, each rank-order function was finished in 20 us. After increasing 
supply voltage up to 1.5 V and proper biasing voltage Vuas adjusting, the performance of the 
circuit can be improved. Including power consumption of the input/output pads, the static 
power consumption of the chip was 1.4 mW. 

Many factors such as precision, speed, process variation, and chip area must be considered 
for design of a low-power low-voltage rank order extractor. 

1. Limitations of low voltage and low power 

The average power consumption of the circuit is expressed by 

p = p + p + p 

dynamic static short _cuircni 

= /CV D 2 D +(I +I kakttge )V DD +Q S JV DD (11) 

where /is the frequency, C is the capacitance in the circuit, Vdd is the voltage supply, I is the 
standby current, heakage is the leakage current, and the Q sc is the short-current charge during 
the clock transient period. In order to reduce the power consumption, the voltage supply 
Vdd must be reduced, and the standby current in the comparator and evaluation cell must 
be designed as small as possible. In mask layout, the clock and its complementary are 
generated locally to reduce delay and mismatch. Thus, the probability of a short current 
occurring in the circuit is minimized. 

2. Speed and precision 

The accuracy of the comparators determines the resolution of the circuit. For the comparator 
design, the smallest differential voltage, that is, distinguished correctly is influenced by two 
factors. One is the charge-injection error in analog switches, and the other is the parasitic 
capacitor C p effect. The effect is reduced by enlarging the sampling capacitor C s and making 
the switches dimension as small as possible. In the design, the response time r of the 
extractor is the summation of the auto-zero time r nz , the comparison time r , and the 
evaluation time r 



coal ' 



T = Taz+ *onp + * oval ( 12 ) 

Reducing r az , r and r mal will improve the response time r . The minimum auto-zero 
time r nz is required to sample the input voltage correctly at sampling capacitor C s and to 
bias the inverter properly at high gain region. The switches shown in Fig. 19 with larger 
dimension reduce auto-zero time T az . However, the clock feed-through error and charge 
injection error will also be enlarged during the clock transition. In the same situation, the 
smaller sample capacitor C s will reduce the time r az . Unfortunately, it will reduce the 
effective magnitude of the difference voltage; thus, the comparator accuracy is degraded. 
The comparison time r dominates the response time r , especially when the input levels 
are close each other. Since the amplification in the transition region of a CMOS inverter 
operated at low voltage supply is not high enough, the comparator must take a long time to 
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identify which input variable has a larger level. The evaluation time r elml is defined so that 
the time interval between the comparator cells generates the proper currents and the 
extractor has finished finding the desired rank order. Time r EI ,„, is a function of the current 
Iunit- The maximum number M of input variables is also influenced by the current Iunit- 
Although reducing the magnitude of the current I un it is able to reduce the power 
consumption, however, the relationship among T evd , Iunit, and M in this architecture is a 
complicated function. 
3. Process variation analysis 

With contemporary technology, process variation during fabrication cannot be completely 
eliminated; as a result, mismatch error must be noticed in VLSI circuit design. The match in 
dimension of the binary-weight MOS in the evaluation cell (Ml - M8 in Fig. 20) is an 
important factor for the circuit operation. If the mismatch error induces an error current Jen- 
larger (or smaller) than half of the unit current I un it, decision of the evaluation cell fails. Thus, 
a rough estimated constraint for I err is 

I err < Knit /2- (13) 

5. Conclusion 

The chapter describes various nonlinear signal processing CMOS circuits, including a high 
reliable WTA/LTA, simple MED cell, and low- voltage arbitrary order extractor. We focus 
the discussion on CMOS analog circuit design with reliable, programmable capability, and 
low voltage operation. It is a practical problem when the multiple identical cells are required 
to match and realized within a single chip using a conventional process. Thus, the design of 
high-reliable circuit is indeed needed. The low-voltage operation is also an important design 
issue when the CMOS process scale-down further. In the chapter, Section 1 introduces 
various CMOS nonlinear function and related applications. Section 2 describes design of 
highly reliable WTA/LTA circuit by using single analog comparator. The analog 
comparator itself has auto-zero characteristic to improve the overall reliability. Section 3 
describes a simple analog MED cell. Section 4 presents a low-voltage rank order extractor 
with fc-WTA function. The flexible and programmable functions are useful features when 
the nonlinear circuit will integrate with other systems. Depend on various application 
requirements, we must have different design strategies for design of these nonlinear signal 
process circuits to achieve the optimum performance. In state-of-the-art process, small chip 
area, low-voltage operation, low-power consumption, high reliable concern, and 
programmable capability still have been important factors for these circuit realizations. 
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1. Introduction 

The transconductor is a versatile building block employed in many analog and mixed-signal 
circuit applications, such as continuous-time filters, delta-sigma modulators, variable gain- 
amplifier or data converter. The transconductor is to perform voltage-to-current conversion. 
Linearity is one of most critical requirements in designing transconductor. Especially in 
designing delta-sigma modulators for high resolution Analog/Digital converters, it needs 
high linearity transconductors to accomplish the required signal-to-(noise+distortions) ratio. 
The tuning ability of transconductor is also mandated to adjust center frequency and quality 
factor in filter applications. 

The portable electronic equipments are the trend in comsumer markets. Therefore, the low 
power consumption and low supply voltage becomes the major challenge in designing 
CMOS VLSI circuitry. However, designing for low-voltage and highly linear 
transconductor, it requires to consider many factors. The first factor is the linear input range. 
The range of linear input is justified by the constant transconductance, G m . Since the 
distortion of transconductor is determined by the ratio of output currents versus input 
voltage. The second factor is the control voltage of transconductor. This voltage can greatly 
impact the value of transconductance, linear range, and power consumption. For example, 
when the control voltage increases, the transconductance also increase but the linear input 
range of transconductor is reduced and power consumption is increased. Hence it is critical 
in designing transconducotr operated at low supply voltage. The third factor is the 
symmetry of the two differential outputs. If the transconductance of the positive and 
negative output is G m + = to+A^i and G m -=Io-/Vi, then how close G m + and G m - should be is a 
critical issue, where Io+ is the positive output current, lo- is the negative output current, and 
Vi is the input differential voltage. This factor is the major cause of common-mode distortion 
of transconductor which occurs at outputs. 

In general, the design of differential transconductor can be classified into triode-mode and 
saturation-mode methods depending on operation regions of input transistors. Triode-mode 
transconductor has a better linearity as well as single-ended performance. On the other 
hand, saturation-mode transconductor has better speed performance. However, it only 
exhibits moderate linearity performance. Furthermore, the single-ended transconductor of 
saturation-mode suffers from significant degradation of linearity. Several circuit design 
techniques for improving the linearity of transconductors have been reported in literatures. 
The linearization methods include: source degeneration using resistors or MOS transistors 
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[Krummenacher & Joeh, 1988; Leuciuc & Zhang, 2002; Leuciuc, 2003; Furth & Andreou, 
1995], crossing-coupling of multiple differential pairs [Nedungadi & Viswanathan, 1984; 
Seevinck & Wassenaar, 1987] class-AB configuration [Laguna et al., 2004; Elwan et al., 2000; 
Galan et al., 2002], adaptive biasing [Degrauwe et al., 1982; Ismail & Soliman, 2000; 
Sengupta, 2005], constant drain-source voltages [Kim et al., 2004; Fayed & Ismail, 2005; 
Mahattanakul & Toumazou, 1998; Zeki, 1999; Torralba et al., 2002; Lee et al., 1994; 
Likittanapong et al., 1998], pseudo differential stages [Gharbiya & Syrzycki, 2002], and shift 
level biasing [Wang & Guggenbuhl, 1990]. 

Source degeneration using resistors or MOS transistors is the simplest method to linearize 
transconductor. However, it requires a large resistor to achieve a wide linear input range. In 
addition, MOS used as resistor exhibits considerable varitions affected by process and 
temperture and results in the linearity degradation. Crossing-coupling with multiple 
differential pairs is designed only for the balanced input signals. The Class-AB configuration 
can achieve low power consumption. On the other hand, the linearity is the worst due to the 
inherited Class-AB structure. The adaptive biasing method generates a tail current which is 
proportional to the square of input differential voltage to compensate the distortion caused 
by input devices. However, the complication of square circuitry makes this technique hard 
to implement. The constant drain-source voltage of input devices is a simple structure. It can 
achieve a better linearity with tuning ability. However, it needs to maintain Vos of input 
devices in low voltage and triode region. Therefore, this technique is difficult to implement 
in low supply voltage. Hence, a new transconductor using constant drain-source voltage in 
low voltage application is proposed to achieve low-voltage, highly linear, and large tuning 
range abilities. 

In section 2, basic operatrion and disadvantage of the linerization techniques are described. 
The proposed new transconductor is presented in section 3. The simulation results and 
conclusion are given in section 4 and 5. 

2. Linearization techniques 

In this section, reviews of common linearization techniques reported in literatures are 
presented. The first one is the transconductor using constant drain-source voltage. The 
second one is using regulated cascode to replace the auxiliary amplifier. The third one is 
transconductor with source degeneration by using resistors and MOS transistors. The last 
one is the linear MOS transconductor with a adaptive biasing scheme. Besides introducing 
their theories and analyses, the advantages and disadvantages of these linearization 
techniques are also discussed. 

2.1 Transconductor using constant drain-source voltage 

The idea of transconductors using constant drain-source voltages is to keep the input 
devices in triode region such that the output current is linearized. The schematic of this 
method is shown in Fig. 1. Considering that transistors Mi, M2 operate at triode region, M3, 
M4 are biased at saturation region, channel length modulation, body effect, and other 
second-order effects are ignored, the drain current of Mi and M2 is given by 



h=P 



(V GS -V T )V D5 -^\ (1) 
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where /? = / w„Cox(W/L), Vgs is the gate-to-source voltage, Vj is the threshold voltage, and Vos 
is the drain-to-source voltage. If the two amplifiers in Fig. 1 are ideal amplifiers, then 

(2) 
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Fig. 1. Transconductor using constant drain-source voltage 
The transfer characteristic of this transconductor is given by 
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The transconductance value is 



G,„=PV C 



(3) 



(4) 



In fact, it is difficult to design an ideal amplifier implemented in this circuits. However, it 
can force Vdsj =Vds2 =Vds by using two auxiliary amplifiers controlled with the same Vc to 
keep Vos at the constant value. Therefore, the transfer characteristic of this transconductor is 
changed as follows: 
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L„=L,n-L„2=PV DS {v M -v, n2 ) 



, where V G si= V inl and V G S2= V in2 . 
Therefore, the new transconductance value is 



c m =pv u 



(5) 



(6) 



The linearity of this transconductor is moderated. It is also easy to implement in circuit. 
However, Vds of the input devices must be small enough to keep transistors in triode region. 
The following condition has to be satisfied: 



V <V -V 

DS GS y T 



(7) 



On the other hand, the auxiliary amplifiers need to design carefully to reduce the overhead 
of extra area and power. 



2.2 Transconductor using regulated cascode to replace auxiliary amplifier 

In Fig. 2(a) regulating amplifier keeps Vds of Mi at a constant value determined by Vc- It is 
less than the overdrive voltage of Mi. The voltage can be controlled from Vc so as to place 
M3 in current-voltage feedback, thereby increasing output impedance. The concept is to 
drive the gate of M3 by an amplifier that forces Vdsi to be equal to Vc- Therefore, the voltage 
variations at the drain of M3 affect Vdsi to a lesser extent because amplifiers "regulate" this 
voltage. With the smaller variations at Vdsi, the current through Mi and hence output 
current remains more constant, yielding a higher output impedance [Razavi, 2001] 
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Fig. 2. (a)Basic triode transconductor structure (b) Simple RGC triode transconductor 
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It is one of solutions using regulated cascode to replace the auxiliary amplifier in order to 
overcome restrictions on Fig. 1. The circuit in Fig. 2(b) proposed in [Mahattanakul & 
Toumazou, 1998] uses a single transistor, M5, to replace the amplifier in Fig. 2(a). This circuit 
called regulated cascode which is abbreviated to RGC. The RGC uses M5 to achieve the gain 
boosting by increasing the output impedance without adding more cascode devices. Vdsi is 
calculated by follows: Assuming M5 is in saturation region in Fig. 2(b). It can be shown that 



lr 



:fi s K 



-v T 



From (6) G m =fl 1 V D 




(9) 



Thus, Cm can be tuned by using a controllable 



voltage source Vc or current source Ic. However, it is preferable in practice to use a 
controllable voltage source Vc for lowering power consumption since Vdsi only varies as a 
square root function of Ic- 

Simple RGC transconductor using a single transistor to achieve gain boosting can reduce 
area and power wasted by the auxiliary amplifiers. However, it still has some 
disadvantages. First, it will cause an excessively high supply- voltage requirement and also 
produce an additional parasitic pole at the source of transistors. Therefore, it can not apply 
to the low-supply voltage design. Second, the tuning range of Vdsi is restricted. The smallest 

I 21 
value of Vdsi is I — — + V T when Vc = 0. In other words, Vdsi can not be set to zero. Owing 

V A 

to the restriction of (7), Vds is as low as possible and the best value is zero. Third, Vr 
dependent G m may be a disadvantage due to the substrate noise and Vt mismatch problems 
[Lee etal., 1994]. 

In Fig. 3, another RGC transconductor that can apply to the low-voltages applications is 
proposed in [Likittanapong et al., 1998]. The circuit overcomes the disadvantages mentioned 
above is to utilize PMOS transistor that can operate in saturation region as gain boosting. 
The use of this PMOS gain boosting in the feedback path can result in a circuit with a wide 
transconductance tuning range even at the low supply voltage. In [Likittanapong et al., 
1998], it mentions that at the maximum input voltage, M3 may be forced to enter triode 
region, especially if the dimension of M2 is not properly selected, resulting in a lower 
dynamic range. Besides, /?2 may be chosen to be larger for a very low distortion 
transconductor. It means that the tradeoff between linearity and bandwidth of 
transconductor is controlled by /?2. Therefore, /?2 should be selected to compromise these two 
characteristics for a given application. 
Vdsi is calculated by follows. Assuming M3 is in saturation region in Fig. 3. 
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ic=jA(Vo«-v T 



(10) 



From (6) G m =AV DS1 =A 



Vr- 
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It shows that Vdsi can be set to zero when 



. + V ■ Therefore, this transconductor has a wider tuning range compared to that of 



RGC transconductor and is capable of working in low-supply voltage (3V). However, this 
transconductor still has some drawbacks. The major drawback is the tuning ability. For 



example, it is difficult to control V c 



-V T , if Vnsi is set to zero. The minor drawback 



is that Vt depends on the G m - It also may cause substrate noise and Vt mismatch problems 
[Lee etal., 1994]. 



Vc 
Q 



M 



3 



O 



In 



Ht 



M, 



V, 



-\\\ 



Fig. 3. RGC transconductor with PMOS gain stage 



2.3 Transconductor using source degeneration 

A simple differential transconductor is shown in Fig. 4(a). Assuming that Mi and M2 are in 
saturation and perfectly matched, the drain current is given by 



i D =f(v GS -v T ) 2 



(11) 
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The transfer characteristic using (5) is given by 



J„, = l *«n -loua = J^WZvJl-^- = J2PI SS V.\1 j-^i r 



(12) 



, where V, = ( V;„i -V,„ 2 ) 

If Vgs is large enough, the higher linearity can be achieved. Unfortunately, it can not be used 
in the low- voltage application and the linear input range is limited. Simplest techniques to 
linearize the transfer characteristic of MOS transconductor is the one with source 
degeneration using resistors as shows in Fig. 4(b). The circuit is described by 
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A transfer characteristic derived from (13) is given by 
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The transconductance G,„ is 



1 + S.R 



(15) 



where g„, is the transconductance of transistor Mi and M2. 

We should notice that in (14), the nonlinear term depends on V, - Rhut rather than V,. Higher 
linearity can be achieved when R » l/g m - The disadvantage of this transconductor is that 
large resistor value is needed in order to maintain a wider linear input range. Owing to 
G„, <** 1/R, the higher transconductance is limited by the smaller resistor. Hence, there is a 
tradeoff between wide linear input range and higher transconductance which is mainly 
determined by a resistor. 
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Fig. 4. (a) Simple differential MOS transconductor (b) MOS transconductor with resistive 
source degeneration 
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Another method to linearize the transfer characteristic of MOS transconductor is using source 
degeneration to replace the degeneration resistor with two MOS transistors operating in triode 
region. The circuit is shown in Fig. 5. Notice that the gates of transistor M3 and M4 connect to 
the differential input voltage rather than to a bias voltage. To see that M3 and M4 are generally 
in triode region, we look at the case of the equal input signals (Vi„i=Vi„2), resulting in 

V,=V y =V inX -V GSl (16) 

Therefore, the drain-source voltages of M3 and M4 are zero. However, Vos of M3 and M4 
equal those of Mi and M2. Owing to (7), M3 and M4 are indeed in triode region. Assuming 
M3, M4 are operating in triode region, the small-signal drain-source resistance of M3, M4 is 
given by 

(17) 



It must be noted that in this circuit the effect of varying Vos of Mi and M2 can not be ignored 
since the drain currents are not fixed to a constant value. The small-signal source resistance 
of Mi, M2 is given by 

1 1 

r si= r si = = —l T (18) 

5ml PiVgsi~ "ti/ 

Using small-signal T model, the small-signal output current, i i, is equal to 

r sl +r s2 +{r dB ,\\r dti ) 

=>L=^r(V GS1 -V T1 lV M -Vj 
A +4/? 3 

Assuming Mi is in saturation region, the drain current of Mi is given by 

hs=\A(v G s 1 -v T1 y 

(20) 
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Using (20) substitutes for (19), that leads to 
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The transconductance G m is 
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Linearity can be enhanced (assuming r*3 >> r s i) compared to that of a simple differential 
pair because transistors operated in triode region exhibits higher linearity than the source 
resistances of transistors operated in saturation region. When the input signal is increased, 
the small-signal resistance in one of two triode transistors in parallel, M3 or M4, is reduced. 
Meanwhile, the reduced resistance results in the lower linearity and the larger 
transconductance. As discussed in [Krummenacher & Joeh, 1988], if the proper size ratio of 
fh //h is chosen, the balance between higher linearity and stable transconductance can be 
achieved. How to choose the optimum size ratio of P1/P3 for the best linearity performance 
becomes slightly dependent on the quiescent overdrive voltage, Vgs~ Vt. The size ratio of /?j 
//?3=6.7 is used to achieve the best linearity performance. 

According to (22), the transconductance can be tuned by changing Iss and size ratio of fli/fy. 
Nevertheless, the nonlinearity error is up to 1% for hut Ass < 80%. It is required to have a 
better linearity so as to achieve a THD of -60 dB or less in some filtering applications [Kuo & 
Leuciuc, 2001]. 
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Fig. 5. Transconductor with source degeneration using MOS transistors 

2.4 Transconductor using adaptive biasing 

The transconductor using adaptive biasing is shown in Fig. 6. All transistors are assumed to 
be operated in saturation region, neglecting channel lengh modulation effect. First, 
transistor M3 is absent, and output current as a function of two input voltages Vmi and Vmi is 
obtained as 

k=j(v GS1 -v T f 



h=^(v GS2 -v T f 



(23) 
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, where Iss is a tail current and equals Ig. 

An adaptive biasing technique is using a tail current containing an input dependent 
quadratic component to cancel the nonlinear term in (23). Consequently, the circuit in Fig. 6 
changes the tail current by adding transistor M3. The tail current will be changed by 
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, where Is is tail current of differential pair and Ic is the compensating tail current that cancel 

nonlinear term. 

Therefore, the transfer characteristic is changed by 
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Fig. 6. Transconductor with adaptive biasing 

3. New transconductor 

The conventional structure which uses the constant drain source-voltage such as RGC with 
NMOS or PMOS can not operate at 1.8V or below. The main reason is that auxiliary amplifier 
under the low supply voltage can't provide enough gain to keep the constant drain-source 
voltage. Therefore, we propose a triode transconductor which uses new structure to replace 
the auxiliary amplifier. Fig. 7 shows the proposed triode transconductor structure. 
MOS M5, M7, M9 and M11 are made up a two-stage amplifier to replace the auxiliary 
amplifier. The two-stage amplifier is implemented using M9 with the active loads Mn 
formed the first stage and M5 with the active load M7 formed the second stage. The first and 
second stages exhibit gains equal to 
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Fig. 7. Proposed triode transconductor 
Therefore, the overall gain is 

A, = A i* A i = g m 9 (gm9 1 I '"ou )gm s (r 05 1 | r 07 ) 
The proposed transconductor is shown in Fig. 8. 
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Fig. 8. The proposed transconductor 

Considering that the large gain is achieved and is able to keep transistors Mi and M2 in 
triode region, the drain current of Mi and M2 is given by 



The transfer characteristic is given by 
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hu t =Lun-I m n=W Dsl {V inl -V in2 ) (32) 

, where fh = fh, Vti =Vt2, and Vdsi = Vds2- Assuming that current I9 flows from Mn through 
M9 and MOS M9 is in saturation region, Vdsi can be found in (33) 
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(33) 
According to (32) 

I« = A V MJ (V jnl - V in2 ) = & (V c - V T7 - V CS3 \V M - V ml ) (34) 

The transconductance G m is 

G m =A(V c -V T7 -V GS3 ) (35) 

From (35), the transconductance can be tuned by control voltage Vc To keep Mi and M2 in 
triode region, the relation (36) needs to be satisfied. 

V DS1 <V GS1 -V T1 (36) 

Using (33) to substitute (36) 

V C -V T7 -V GS3 < V GS1 -V T1 =>V C < V GS1 + V GS3 - (V T1 - V T7 ) (37) 

The proposed transconductor is suitable for low supply voltage and we choose 1.8V to 
achieve a wide linear range. Moreover, M9 is needed to obtain a negative feedback to keep 
the drain-source voltage of Mi, Vdsi, constant. This new structure can provide enough gain 
to keep Vdsi constant at 1.8V supply voltage. It has a low control voltage Vc between 
0.69V-0.72V and the large transconductance tuning range depending on applications. 
Besides, it has a simple structure so as to save area. 

4. Simulation results 

The circuits in Fig. 8 have been designed by using TSMC CMOS 0.18pm process with a 
single 1.8V supply voltage and simulated by Hspice. Fig. 9. shows the curve of input voltage 
transferring to the output current at Vc = 0.7V. The slope of the curve is linear when the 
input voltage varies from -IV to IV. The slope in Fig. 9. is equal to the transconductance in 
Fig. 10. In order to verily the performance of the proposed transconductor, we define 
transconductance error (equation 39) as the linearity of the transconductance's output 
current. The transconductance error is less than 1% among ±0.9V input voltage, so the input 
linear range is up to 1.8V. 

T£(%)= ^"^"V?" 100 (39) 

G ™(0) 
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Fig. 9. V-I transfer characteristic 
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Fig. 10. The simulated transconductance at Vc=0.7V 

In Fig. 11. it shows the drain-source voltage of the input transistors Mi and M2, Vdsi and 
Vds2/ changes with the input voltage. Within ±1V input voltage, Vdsi and Vds2 are very 
small. According to equation (40), Vdsi and Vds2 are too small such that transistors Mi and 
M2 can be set in triode region. Once the input voltage exceeds ±1V, Vdsi and Vds2 will 
increase rapidly. It results in that transistors Mi and M2 enter in saturation region. In other 
words, when Mi and M2 entering saturation region the proposed transconductor can not 
maintain the high linearity. 



v nc <v„-v T 



(40) 



When Vc is set between 0.69V and 0.72V, the linear input range is up to 2.6V and the 
transconductance error is less than 1%. The smallest transconductance is 3.4us and linear 
input range is 1.2V when Vc is 0.720V. The highest transconductance is 542us and linear 
input range is 1.4V when Vc is 0.690V. Table 1 shows the linear input range and the 
transconductance tuned by different Vc- Therefore, the proposed transconductor achieve a 
large tuning range. 
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Fig. 11. The drain-source voltage of input transistor Mi and M2 



V c 
(V) 


Linear input range 
(V) 


Transconductance 
(PS) 


0.690 


1.4 


542 


0.695 


1.8 


434 


0.700 


1.8 


326 


0.705 


2.2 


219 


0.710 


2.4 


122 


0.715 


2.6 


42 


0.720 


1.2 


3.4 



Table 1. Vc versus Linear input range 

In Fig. 12., the simulated THD as a function of the input frequency and input signal 
amplitude is plotted. The best THD is achieved at the low input voltage and the low 
frequency. When Vc is 0.7V, the linearity of the proposed transconductor is less than -60dB 
for 0.7Vpp at lOOKHz. 
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Fig. 12. Simulated THD for different input frequencies 

Fig. 13. shows the linearity of transconductor in three linearization techniques. The 
transconductor using source degeneration with resistor is shown in Fig. 4(b), and the 
transconductance in Fig. 13(a) is tuned by different resistors. The transconductor using 
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source degeneration with MOS transistors is shown in Fig. 5, and the transconductance in 
Fig. 13(b) is tuned by the different size ratio of /?j//%. The transconductor using adaptive 
biasing is shown in Fig. 6, and the transconductance in Fig. 13(c) is tuned by the different 
compensating tail current, I& Fig. 14. Shows the simulation result of the proposed technique 
and other techniques. Fig. 14(a) is the full plot of the different linearization techniques. From 
Fig. 14(b) it can be easily seen that the linearity achieved by the newly proposed technique is 
better than all other implementations. 
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Fig. 14. Simulated transconductance for four linearization techniques (a) Full plot (b) Detail 

The simulated THD of the output differential current versus the input signal amplitude for 
the four linearized transconductors is plotted in Fig. 15. The proposed transconductor 
achieves THD less than -61dB for the O.TVpp input voltage, HdB better than the one using 
source degeneration using resistor, 24dB better than the one using source degeneration 
using MOS, and 31dB better than the one using adaptive biasing, at the same input range. 
Table 2. shows the power consumption of the four linearized transconductors at the same 
transconductance. Power consumption changes with the different transconductances. 
Therefore, the same transconductance is chosen to be compared in each configuration. Table 
3. shows different power consumption at the different transconductance of the proposed 
transconductor. 
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Fig. 15. Simulated THD at 1MHz for the four linearized transconductors 
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using MOS 


Source 
degeneration 
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Power (mW) 
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Table 2. The power consumption of four linearized transconductors 
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Power (mW) 


G m (uA/V) 
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122 
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42 


0.720 


0.733 


3.4 



Table 3. The power consumption at different transconductances 

Table 4. shows the comparison of performance with other transconductors at the low supply 
voltage (under 2V). The transconductor in [Fayed & Ismail 2005] also uses constant drain- 
source voltage. It modifies the basic structure of constant drain source voltage and uses the 
moderate amplifier. The proposed transconductor modifies the auxiliary amplifiers to 
obtain high gain under low supply voltage. 

The layout including proposed transconductor, Common Mode Feedback, and bandgap is 
shown in Fig. 16. The proposed transconductor uses STC pure 1.8V linear I/O library in 
0.18um CMOS process. The chip area is 0.516mm 2 . 
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Table 4. Comparison table 




Fig. 16. The layout of proposed transconductor 
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5. Conclusion 

The proposed low-voltage, highly linear, and tunable triode transconductor achieves the 
wide linear input range up to 2.4V. The total harmonic distortion is -60dB with a 0.7V pp 
input voltage. The design uses TSMC 0.18pm CMOS technology and supply voltage is 1.8V. 
Moreover, it exhibits a large G m tuning range from 3.4uS to 542uS and also keeps a wide 
linear input range. Finally, the performance comparison with other linear techniques shows 
that the proposed technique achieves better linearity, wider tuning range, and wider linear 
input range. 
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1. Introduction 



To the present day, the performance of microprocessors has progressed dramatically. 
Recently, almost all computer systems use reduced instruction set computer (RISC) 
architectures. However, about 30 years ago, complex instruction set computer (CISC) 
architectures were widely used for almost all computer systems. The advantages and 
successes of RISC architectures are attributable to their simplified structures. 
Conventional complex instruction set computer (CISC) architectures invariably included 
various and numerous instruction sets. Each instruction was able to execute a complicated 
multi-step operation. For that reason, the CISC architectures were useful in assembler-based 
programming environments and in systems with small amounts of memory. However, such 
complicated architectures prevent increases in clock frequency or a processor's processing 
power. 

Therefore, RISC architectures — which use simple architectures based on single-step 
instruction sets — have been developed. The RISC architectures present advantages in terms 
of higher clock frequency, smaller implementation area, and lower power consumption than 
conventional complex instruction set computer (CISC) architectures. Observation of many 
examples reveals that, in circuit implementations, a simple structure is best to increase the 
overall performance. That principle is also applicable to programmable devices. 
If clock-by-clock reconfigurable devices are used, even a single instruction set computer 
(SISC) can be implemented onto them. A single instruction set computer is one in which a 
processor has only a single instruction. During production, various single instruction set 
computers are prepared: a single instruction set computer with an AND logic function, a 
single instruction set computer with an adder function, and so on. These processor units are 
implemented at necessary times and at necessary places of a programmable device. In CISC 
and RISC architectures, the hardware is fixed. Its operations are switched using software 
commands, as portrayed in Fig. 1(a). In contrast, in a single instruction set computer, the 
operation changes are executed by hardware reconfigurations, as shown in Figs. 1(b) and 
1(c). Therefore, in a single instruction set computer, a processor with a certain function itself 
can be reconfigured to another processor with another function. 

The implementation of such single instruction set computers provides the following 
advantages under programmable device implementations. A single instruction set computer 
with the simplest architecture can operate at the highest clock frequency among all 
processor architectures. In RISC architectures, many selectors to change functions are 
implemented; such selectors have a certain delay. However, single instruction set computers 
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Fig. 1. RISC architecture and SISC architecture. 

require no selector for use in function changes. Moreover, the inherent circuit complexity 
invariably increases the load capacitance and wiring capacitance at each circuit point. Large 
capacitance always decreases the maximum clock frequency. Therefore, the clock 
frequencies of simple architectures of single instruction set computers are higher than those 
of RISC and CISC architectures. As a result, the performance of single instruction set 
computers is superior to those of multi-instruction set computers. 

Figure 1(d) shows that, since such a single instruction set computer can be implemented in a 
small area, large parallel computation can be achieved. Thereby, the total performance can 
be increased dramatically. However, to increase processing power using this concept, 
programmable devices must have a high-speed reconfiguration capability and a capability 
with numerous reconfiguration contexts to continue high-speed reconfigurations. 
Currently, field programmable gate arrays (FPGAs) are widely used for many applications 
(l)-(3). Such FPGAs are always implemented with an external ROM. At power-on, a 
configuration context is downloaded from the external ROM to an internal configuration 
memory. However, such FPGAs have been shown to be unsuitable for dynamic 
reconfiguration applications because FPGAs require more than several milliseconds' 
reconfiguration time because of their serial transfer configuration mechanism. 
On the other hand, high-speed reconfigurable devices have been developed, e.g. DRP chips 
(4). They include reconfiguration memories and a microprocessor array on a single chip. The 
internal reconfiguration memory stores the reconfiguration contexts of 16 banks, which can 
be substituted for one another during a clock cycle. Consequently, the arithmetic logic unit 
can be reconfigured on every clock cycle in a few nanoseconds. Unfortunately, increasing 
the internal reconfiguration memory while maintaining the number of processors is 
extremely difficult. 

As with other rapidly reconfigurable devices, optically reconfigurable gate arrays (ORGAs) 
have been developed, which combine a holographic memory and an optically 
programmable gate array VLSI, as portrayed in Figs. 2 (5)-(9). Many configuration contexts 
can be stored in a holographic memory. Thereafter, they can be read out optically and 
programmed optically onto a gate array VLSI using photodiodes perfectly in parallel. 
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Therefore, high-speed configuration is possible in addition to numerous reconfiguration 
contexts. Such ORG A architectures present the possibility of opening the implementations 
of single instruction set computers. 

This chapter introduces a VLSI design of an ORGA architecture: a dynamic ORGA 
architecture suitable for implementations of single instruction set computers. 

2. ORGA architecture 

Laser Array Light 

Holographic 
memorv 




Photodiode 
Array 



WtttWtWWH 1 Optical Connection 
Gate Array VLSI 

Fig. 2. Overall construction of an optically reconfigurable gate array (ORGA). 

An overview of an Optically Reconfigurable Gate Array (ORGA) is portrayed in Fig. 2. An 
ORGA comprises a gate-array VLSI (ORGA-VLSI), a holographic memory, and a laser diode 
array. The holographic memory stores reconfiguration contexts. A laser array is mounted on 
the top of the holographicmemory for use in addressing the reconfiguration contexts in the 
holographic memory. One laser corresponds to a configuration context. Turning one laser 
on, the laser beam propagates into a certain corresponding area on the holographic memory 
at a certain angle so that the holographicmemory generates a certain diffraction pattern. A 
photodiode-array of a programmable gate array on an ORGA-VLSI can receive it as a 
reconfiguration context. Then, the ORGA-VLSI functions as the circuit of the configuration 
context. The reconfiguration time of such an ORGA architecture reaches nanosecond-order 
(5),(6). Therefore, very-high-speed context switching is possible. In addition to it, since the 
storage capacity of a holographicmemory is extremely high, numerous configuration 
contexts can be stored in a holographic memory. Therefore, the ORGA architecture can 
dynamically implement single instruction set computers. 

3. Dynamic ORGA architecture 

A configuration context is optically applied in ORG As. In ORGA-VLSIs, a certain detection 
circuit must be used in addition to a programmable gate array. The detection circuit is called 
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an optical reconfiguration circuit. Such an optical reconfiguration circuit is connected to 
each programming point of a programmable gate array. Therefore, the number of 
reconfiguration circuits can be as large as those of FPGAs. The resultant reduction of the 
implementation area of optical reconfiguration circuits is extremely important in ORG As. 
In major ORGAs (5),(6), each optical reconfiguration circuit consists of a photodiode, a 
refresh transistor, and a single-bit static configuration memory, as portrayed on the left side 
of Fig. 3. A reconfiguration procedure is initiated by charging the junction capacitance of the 
photodiode using refresh transistors. After charging, an optical configuration context is 
provided from a holographic memory and is received on the photodiodes. The electric 
charge in the junction capacitance of each light-received photodiode is discharged and the 
electric charge in the junction capacitance of each photodiode receiving no light is retained. 
The resultant difference is detectable by sensing the voltage between the anode and cathode 
of the photodiode. The sensed information is temporarily stored on a single-bit static 
configuration memory. Then, the context information is provided to each programming 
point of a gate array. Using this technique, a configuration context can be retained 
indefinitely in the ORGA-VLSI so that the state of the gate array can be maintained 
statically. 
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Fig. 3. Optical reconfiguration circuits of static and dynamic techniques. 

However, the static configuration memory prevents realization of high gate count ORGA- 
VLSIs. The static configuration memory comes to occupy about 25% of the area of an entire 
VLSI chip. Moreover, using the memory function for storage during an indefinite period can 
be considered as over-capacity for implementation in single instruction set computers 
because a processor of a single instruction set computer is dynamically reconfigured. For 
that reason, its lifetime is very short. In addition, the configuration information is stored on 
a holographic memory; the information can therefore be read out anytime. Because of that 
feature, even when long-term functions are required, a certain refresh cycle enables such 
function implementations. Therefore, a Dynamic Optically Reconfigurable Gate Array 
(Dynamic ORGA) architecture without a long-term storable configuration memory was 
proposed (7). A photodiode invariably has junction capacitance. Therefore, the junction 
capacitance can maintain the state of a gate array for a certain time. The dynamic ORGA 
perfectly removes the static configuration memory to store a context and uses the junction 
capacitance of photodiodes as dynamic configuration memory, as shown on the right side of 
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Fig. 4. An island-style gate array constructed by optically reconfigurable logic blocks 
(ORLBs), optically reconfigurable switching matrices (ORSMs), and optically reconfigurable 
I/O blocks (ORIOBs). 

Fig. 3. Following such a concept of single instruction set computers, the junction capacitance 
of photodiodes is sufficient to retain the state of a gate array. This architecture is called a 
dynamic ORGA architecture. The dynamic ORGA architecture is a very advanced ORGA 
architecture in terms of gate density in ORG As. 

4. VLSI design with 51,272 gates 

This section presents a description of the design of a 51,272 gate DORGA-VLSI. The 51,272- 
gate-count DORGA-VLSI chip was designed using a 0.35 um standard complementarymetal 
oxide semiconductor (CMOS) process. The basic functionality of the DORGA-VLSI is 
fundamentally identical to that of currently available field programmable gate arrays 
(FPGAs). The DORGA-VLSI takes an island-style gate array or a fine-grain gate array. 



4.1 Photodoide cell design 

Always, the depletion depth of a photodiode between an N-well and a P-substrate is deeper 
than that of a photodiode between an N-diffusion and a P-substrate. However, the 
minimum size of a photodiode between an N-well and a P-substrate is always larger than 
that of a photodiode between an N-diffusion and a P-substrate. Since an ORGA requires 
many photodiodes, the implementation area reduction is very important. For that reason, 
photodiodes were constructed between the N-diffusion and the P-substrate. The acceptance 
surface size of the photodiode is 8.8 x 9.5 jim 2 . In addition, the photodiode cell size is 21.0 x 
16.5 j,im 2 . Such a cell was designed as a full custom design. The fourth metal layer is used for 
guarding transistors from light irradiation; the other three layers were used for wiring. 
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Technology 


0.35 ^(m double-poly 
four-metal CMOS process 


Chip size [ mm 2 ] 


14.2 xl4.2 


Supply Voltage [V] 


Core 3.3, I/O 3.3 


Photodiode size [fim 2 ] 


9.5 x8.8 


Horizontal distance between 
photodiodes [}im] 


28.5-42 


Vertical distance between 
photodiodes [jim] 


12-21 


Number of photodiodes 


170,165 


Number of logic blocks 


1,508 


Number of switching matrices 


1,589 


Number of I/O bits 


272 


Gate count 


51,272 



Table 1. Specifications of a high-density DORGA. 

4.2 Optically reconfigurable logic block 

A block diagram of an optically reconfigurable logic block of the DORGA-VLSI chip is 
presented in Fig. 5. Each optically reconfigurable logic block consists of 2 four-input one- 
output look-up tables (LUTs), 10 multiplexers, 8 tri-state buffers, and 2 delay-type flip-flops 
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Wiling Channel 



Ti i-state buffers 




Fig. 5. Block diagram of an optically reconfigurable logic block (ORLB). 
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Fig. 6. CAD layout of the optically reconfigurable logic block (ORLB). 

with a reset function. The input signals from the wiring channel, which are applied through 
some switching matrices and wiring channels from optically reconfigurable I/O blocks, are 
transferred to LUTs through eight multiplexers. The LUTs are used for implementing 
Boolean functions. The outputs of an LUT and of a delay-type flip-flop connected to the LUT 
are connected to a multiplexer. A combinational circuit and sequential circuit can be chosen 
by changing the multiplexer, as in FPGAs. Finally, outputs of the multiplexers are connected 
to the wiring channel again through eight tri-state buffers. As a result, each four- input one- 
output LUT, multiplexer, and tri-state buffer has 16 photodiodes, 2 photodiodes, and 1 
photodiode, respectively. In all, 58 photodiodes are used for programming an optically 
reconfigurable logic block. The optically reconfigurable logic block can be reconfigured 
perfectly in parallel. The CAD layout is depicted in Fig. 6. This is a standard-cell based 
design. The cell size is 294.0 x 186.5 ^m 2 . Wiring between cells was executed using the first 
to the third metal layers while avoiding the aperture area of the photodiode cell. Such 
optically reconfigurable logic block design is based on a standard cell design, except for 
custom designs of transmission gate cells and photodiode cells. Each photodiode is arranged 
at 42.0 }im horizontal intervals and at 12.0-21.0 }im vertical intervals. 



4.3 Optically reconfigurable switching matrix 

Similarly, optically reconfigurable switching matrices are optically reconfigurable. The block 
diagram of the optically reconfigurable switching matrix is portrayed in Fig. 7. Its basic 
construction is the same as that used by Xilinx Inc. Four-directional switching matrices with 
48 transmission gates were implemented in the gate array. Each transmission gate can be 
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Fig. 7. Block diagram of an ORSM. 




Fig. 8. CAD layout of an ORSM. 

considered as a bi-directional switch. A photodiode is connected to each transmission gate; 
it controls whether the transmission gate is closed or not. Based on that capability, four- 
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direction switching matrices can be programmed as 48 optical connections. The CAD layout 
is portrayed in Fig. 8. The cell size is 177.0 x 186.5 }im 2 . As with the ORLBs, wiring was 
executed using the first to the third metal layers, thereby avoiding the aperture area of the 
photodiode cell. Such an optically reconfigurable switching matrix was designed using 
custom cells of photodiode cells and transmission gate cells, except for some buffers. Each 
photodiode is arranged at 28.5 fiin horizontal intervals and at 12.0-21.0 fim vertical intervals. 

4.4 Gate array 

Figure 4 depicts the gate array structure. Table 1 presents its specifications. The gate array 
was designed using the Design Compiler logic synthesis tool and the Apollo place and route 
tool (Synopsys Inc.). The ORG A- VLSI chip consists of 1,508 optically reconfigurable logic 
blocks (ORLB), 1,589 optically reconfigurable switching matrices (ORSM), and 272 optically 
reconfigurable I/O bits (ORIOB). Each optically reconfigurable logic block is surrounded by 
wiring channels. In this chip, one wiring channel has eight connections. Switching matrices 
are located on the corners of optically reconfigurable logic blocks. Each connection of the 
switching matrices is connected to a wiring channel. 

The accepted surface size of the photodiode and photodiode-cell size, including an optical 
reconfiguration circuit are, respectively, 8.8 x 9.5 fim 2 and 21.0 x 16.5 ^m 2 . The photodiode 
cells were arranged at 28.5-42.0 fim horizontal intervals and at 12.0-21.0 ]im vertical 
intervals: in all, 170,165 photodiodes were used. The fourth metal layer is used for guarding 
transistors from light irradiation; the other three layers were used for wiring. 

4.5 Reconfiguration performance 

The retention time and configuration time of photodiode memory architecture in a DORGA- 
VLSI were estimated experimentally using another DORGA-VLSI chip. That other VLSI chip 
was fabricated using the same CMOS process. In addition, the VLSI chip has identical 
photodiode construction and characteristics. Therefore, although a 51,272 DORGA-VLSI 
chip has never been fabricated, its characteristics were measured using the other DORGA- 
VLSI chip. As a result, the retention time of photodiode was measured as longer than 45 s. 
That retention time is much longer than that of current DRAMs. Consequently, the storage 
time is sufficient for the implementation of single instruction set computers. Additionally, 
the product of the photodiode response time and laser power for each photodiode was 
measured as 1 reconfiguration ' Pkser = 12.7 pj. That measurement demonstrates that nanosecond- 
order configuration is possible. 

5. Conclusion 

This chapter has introduced and explained important concepts related to single instruction 
set computers. Such single-instruction set computers constitute an acceleration method used 
with microprocessor operations. To implement them, clock- by-clock dynamically 
reconfigurable devices are desired. However, using current VLSI technologies, simultaneous 
realization of fast reconfiguration and numerous reconfiguration contexts is impossible. To 
realize such clock-by-clock dynamically reconfigurable devices, another technology must be 
developed. As one possibility, this chapter has introduced and described an optically 
reconfigurable gate array VLSI. Currently, the gate count and performance of such ORGA- 
VLSIs are insufficient. Nevertheless, such architecture presents the possibility of overcoming 



54 Advances in Solid State Circuits Technologies 

current VLSI limitations. Realizing a device to overcome those limitations remains as a 
subject for future works. 
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1. Introduction 

Over the course of the past three decades, we have witnessed dramatic changes in our 
lifestyles. This is attributed to an unprecedented revolution of information technology (IT). 
The key element of the IT revolution is the continuing advancement of semiconductor 
technology. A major driving force of semiconductor technology lies in silicon. The silicon 
semiconductor has been applied to logic chips as well as memory chips for various 
applications. Meanwhile, the silicon memory has been at the center of an ongoing battle to 
manufacture the smallest, highest density, and most innovative product. Since their 
invention in the early 1970s, silicon memory devices have advanced at a remarkable pace. 
Silicon based memories such as dynamic random access memory (DRAM), static random 
access memory (SRAM), and Flash memory have been crucial elements for the 
semiconductor chip industry in the areas of density, speed, and nonvolatility, respectively. 
An important growth engine is scaling, which has enabled multiple devices to be integrated 
within a given area, resulting in an exponential increase in density and a decrease in bit-cost 
(Moore, 1965). The traditional scaling approach, however, is now confronting physical and 
technical challenges toward the end-point of the international technology roadmap for 
semiconductors (ITRS), indicating that the revenue from downscaling will diminish as 
scaling slows. Thus, an entirely new concept is required to ensure that silicon memory 
technology remains competitive. To meet this stringent requirement, this chapter will 
exploit a new paradigm of memory technology. 

An ideal memory device should satisfy three requirements: high speed, high density, and 
nonvolatility. Unfortunately, a memory satisfying all requirements has yet to be developed. 
Memory devices have consequently been advanced by pursuing just one of these virtues, 
and appear in many different forms. SRAM dominates high speed on-chip caches for 
advanced logic and DRAM occupies applications for high-density and high-speed 
computation; but DRAM's data is volatile, and Flash memory is widely used for high 
density and non-volatile data storage. Therefore, if a single memory transistor can process 
different memory functions, a paradigm shift from 'scaling' to 'multifunction' can continue 
the evolution of silicon technology. In this chapter, the prototype of the fusion memory, 
named unified-random access memory (URAM), is introduced that can simplify device 
architecture, reduce power consumption, increase performance, and cut bit-cost. 
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2. Operational principle of URAM 

URAM is composed of a single memory transistor, which must be of the smallest cell size. It 
can perform nonvolatile functions or high-speed operations according to the set of 
operational biases. In other words, circuit designers can specify URAM to be Flash memory 
or DRAM in order to comply with their specifications. Before discussing the details of 
URAM, each underlying operation principle is briefly introduced. 

2.1 Flash memory operation 

Advancements in high quality and ultra-thin oxides have paved the way for nonvolatile 
memory for silicon-oxide-nitride-oxide-silicon (SONOS) devices, which have replaced 
conventional floating-gate memory (Brown & Brewer, 1998). Fig. 2-1 shows the SONOS 
device structure and the program/erase operations. The device has a multiple gate dielectric 
stack consisting of tunnel oxide/ nitride/ control oxide (O/N/O), and the charges are stored 
in discrete traps in the nitride layer sandwiched between the upper/ lower oxide barriers. 
The stored charges are positive or negative depending on whether negative or positive 
voltage is applied to the gate electrode. If positive programming voltage is applied to the 
gate, the electrons quantum-mechanically tunnel from the inverted channel through the 
tunnel oxide, and these electrons are stored in the deep-level traps in the nitride layer. 
During erasing, the holes are injected into the traps in the nitride in a manner similar to the 
program operation. The data is identified by the difference in the drain current. Once the 
charges are stored, the information is retained for up to 10 years with 10 6 to 10 7 
program/ erase cycles. Due to the superior ability of data retention, SONOS memory is 
called nonvolatile memory. From the perspective of speed, however, the writing requires 
few to few tens of microseconds, which might be too long to transfer high density data. 
Thus, the SONOS has been mainly utilized for portable applications, such as MP3 players, 
digital cameras, and memory stick solution. 
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Fig. 2-1. Operational principle of a SONOS Flash memory, (a) schematic of SONOS structure 
and (b) drain current versus gate voltage characteristics for two data states. The information 
is stored as a form of nitride trapped charges. The polarity and amount of charges stored in 
nitride layer determine the threshold voltages. The data is distinguished by measuring drain 
current flow at a given voltage. Once the charges are stored, the data is retained for over ten 
years so that the Flash memory is referred as nonvolatile memory. 
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2.2 Capacitorless 1T-DRAM operation 

In conventional one transistor/ one capacitor DRAM (1T/1C DRAM), Moore's Law tends to 
be invalid as the device scaling advances. While the cell transistors continue scaling, the cell 
capacitors cannot shrink much because they should store a detectable amount of charge, 
which is equivalent to the minimum cell capacitance, 30fF/cell. Therefore, the size mismatch 
between the transistors and capacitors leads to complexity in the fabrication process. In 2001 
(Okhonin et al., 2001), the densest and cheapest DRAM, which is called capacitorless IT- 
DRAM or zero-capacitorless RAM (ZRAM), was developed. The capacitorless IT-DRAM 
replaces the large and complicated capacitor to be fabricated with a floating-body capacitor. 
The capacitorless IT-DRAM exploits inherent properties, known as the floating body effects 
or history effects, of transistors made on silicon-on-insulator (SOI) substrates. The floating 
body effects are generally considered as parasitic by circuit designers because they cause the 
current overshoot, and obstruct to model and implement into circuit simulator (Gautier, 
1997). While the majority of efforts are made to suppress these effects, Okhonin et al. found 
out that they can be a method to temporarily store the information. Fig. 2-2 illustrates the 
principle of the capacitorless IT-DRAM. In the program, the impact ionization process 
generates pairs of electron and holes. While the electrons exit the channel through the drain, 
the holes are repelled by the drain, charging the body. Since the body is isolated vertically 
by the energy band offset of the buried-oxide and gate oxide, and laterally by the built-in 
potential energy of the n + source and n + drain with a p-type body, the confined holes are 
stored inside the floating body, as shown in Fig. 2-2. During erase, the negative drain 
voltage pulls the holes out of the floating body. The information is identified by turning on 
the transistor and measuring the amount of current flow. More current flows at 
programmed state as the positive body charges contribute to lowering the channel potential. 
Since the holes can disappear by recombination at the programmed state and the holes can 
be generated by band-to-band tunneling or thermal generation at the erased state, the data 
is volatile. However, as the generation and removal of holes only takes a few nanoseconds, 
the capacitorless IT-DRAM can be embedded for high-speed applications such as the caches 
of microprocessor, digital signal processor (DSP), system-on-chip (SOC), etc. 




(a) (b) (c) 

Fig. 2-2. Operational principle of a capacitorless IT-DRAM, (a) Schematic of floating body 
structure, (b) energy band diagram of capacitorless IT-DRAM, and (c) drain current versus 
gate voltage characteristics for two data states. The information is stored as a form of 
floating body charges. The excess holes inside the floating body increase the drain current. 
Since the stored charges disappear in a second, it is referred as DRAM. 
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2.3 Operation principle of URAM 

The basis of URAM lies in the difference of the inherent operational biases for Flash and 
capacitorless IT-DRAM (Han et al., 2007). Fig. 2-3 shows an operational bias domain for two 
memory modes. For erase operation, the two erase bias regions are distinctive. For program 
operation, even though the two regions partially overlap, the Flash memory utilizes 
relatively higher biases than the capacitorless IT-DRAM. The overlapping region might 
cause them to disturb each other, a problem that will be solved in Section 5. If the proper 
biases are selected, two functions can work without disturbance from each other. In order to 
realize two functions in a single transistor, O/N/O gate dielectric is embodied onto a 
floating body transistor. When the Flash memory mode is activated, relatively higher 
voltages are used. On the other hand, relatively lower voltages are utilized to activate the 
capacitorless IT-DRAM mode. Once the mode of URAM is determined, the operational 
biases are accordingly selected. 



Gate voltage 



ERASE 

capacitorless 
1T-DRAM 



PROGRAM 

r Flash \ 

PROGRAM 



Gate 



i — i j 



capacitorless 
1T-DRAM 



Drain voltage 



i h MM > 

Buried oxide 
I Substrate ] 



<Flash program> 

iate 



ERASE 
Flash 



5 



!*> 



Buried oxide 



Substrate 



(a) 



<1T-DRAM program> 

(b) 



Fig. 2-3. (a) Operational bias domain of URAM and (b) schematics device structure and the 
program mechanism of two functions. The inherent difference stems from the distinctive 
operational domain, which allows independent functions in a single memory transistor. 

The operational sequence is presented in Fig. 2-4. The memory block is firstly selected, and 
the operation mode is then decided. If the nonvolatile mode is chosen, the Flash operation is 
activated. Similarly, the capacitorless IT-DRAM is activated if the high speed mode is 
needed. When the mode transits from Flash to capacitorless IT-DRAM, the cell transistors in 
the selected block should be initialized to have a threshold voltage of 0.2V. If the threshold 
voltages are not initialized and high value remains, the greater gate voltage would be 
required to bias the fixed gate overdrive voltage. The high gate voltage can gradually 
impose stress on the gate oxide, which gradually increases the threshold voltage. On the 
other hand, if the initialized threshold voltage is small or even negative, excess holes can be 
generated, even in the zero gate voltage (off-state) since the carrier supplement is sufficient 
to trigger impact ionization, which can cause drain disturbance. It should be noted that the 
impact ionization process for the program operation of the capacitorless IT-DRAM can 
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adversely affect the charge trapping into the O/N/O layer. Here, the undesirable threshold 
voltage shift caused by capacitorless IT-DRAM program is referred to as a soft-program. 
Since the soft-program causes unstable operation, the threshold voltage should be 
periodically monitored to find out whether the cells have suffered from soft-programming. 
The memory block would be re-initialized if the cells failed the verification test. This 
verification and re-initialization loop is an essential but time consuming process. The 
method to minimize and, furthermore, eliminate this redundant loop will be discussed in 
Section 5. 
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Fig. 2-4. Operational sequence diagram for URAM. The mode is selected according to the 
designer's demand. Since the program mechanism between two modes partially overlaps, 
the verification and re-initialization loop is inserted in high-speed mode. 



3. Device fabrication and various quantum substrates 

3.1 Various quantum substrates for URAM 

To date, silicon-on-insulator (SOI) substrate has been utilized for the capacitorless IT- 
DRAM. As embedded DRAM (eDRAM) now occupies more than 50% of the total chip area, 
and advanced processors have started to pick up SOI, the capacitorless IT-DRAM made on 
SOI substrate is highly attractive for embedded memory. However, since a bulk substrate 
still occupies a significant portion of the market share, if the floating body effect is found in 
the bulk substrate, a chip built on the bulk substrate will be fully blessed with the benefits 
from the bulk substrate technology. It is true that major memory industries are conservative 
to adopt SOI substrate for their stand-alone memory products mainly due to the cost issue. 
Therefore, the capacitorless IT-DRAM fabricated on the bulk substrates will be explored in 
terms of not only the embedded memory, but also stand-alone memory applications. In this 
section, the various quantum substrates, in particular the bulk substrates embodied with the 
quantum energy band structure, are introduced and the device fabrication process is 
illustrated. 
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Fig. 3-1. Various templates and their energy band diagrams for excess hole storage. The SOI, 
SOSC, and SONW are potential barrier types, and the SOSC is potential well types. Unlike 
the SOI and SOSC, the SONW and SOSG confine holes by shallow trench isolation (STI) 
oxide in the lateral direction. 

The quantum substrates used for the device fabrication and their corresponding energy 
band diagram for hole storage are comparatively shown in Fig. 3-1. In SOI substrate, the 
excess holes are vertically confined between the tunnel oxide and the buried oxide and are 
horizontally isolated by the built-in potential barrier of the n + source/ drain and p-type 
body. Next, three methods for the floating body in bulk substrates are introduced. The 
hetero-epitaxial growth of semiconductor can imitate the energy band lineup of SOI. The 
introduction of carbon (C) into the silicon substrate enlarges the energy band-gap (Kim & 
Osten, 1997). Thus, the sequential growth of Sii- y C y and Si on the bulk wafer can mimic SOI 
substrate. Here, Sii- y C y serves as the role of the buried oxide. This substrate is named SOSC 
after the abbreviation of silicon-on-silicon carbon. Similar to the SOI substrate, the tunnel 
oxide and the valence band barrier at Si/Sii_ y C y confine holes in the vertical direction, and 
the built-in potential at the junction boundary confines in the horizontal direction. The n + 
ion deep implantation onto the p-type bulk substrates forms the buried n-type well 
structure (Ranica et al., 2005). The n-type well and p-type body forms a built-in potential 
barrier that prevents the holes from flowing out to the substrate terminal. This template is 
named SONW after the abbreviation of silicon-on-n + well. Whereas the holes are 
horizontally isolated by the junction barrier at SOI and SOSC, the SONW confines hole by 
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shallow trench isolation (STI) oxide. In order to an avoid electrical short between the n + 
source/ drain and n + well, the n + well should be buried much deeper than the junction depth 
of the source/ drain. This requirement inevitably imposes a minimum space between the 
two junctions. Therefore, that opened space should be filled by the STI oxide. The 
aforementioned three substrates: SOI, SOSC, and SONW, vertically confine holes with the 
potential barrier. Similarly, a potential well can also store the excess holes as the potential 
barrier did. Similar to the SOSC preparation, the introduction of germanium (Ge) into the 
silicon substrate reduces the energy bandgap. Thus, the sequential epitaxial growth of Sii- 
x Ge x and Si forms the potential well (Ni & Hansson, 1990). In order to avoid the loss of the 
excess holes via recombination at the source/ drain junction, buried Sii- x Ge x is placed under 
the source/ drain junction boundary. As a result, the STI oxide blocks the evacuation of 
stored holes along the lateral direction. The silicon-on-silicon germanium is referred to 
SOSG. In addition to the fundamental interest in the well-type storage media, Sii_ x Ge x is 
more frequently studied in the literature than Sii_ y C y , and Sii_ x Ge x has been already adopted 
for the strained technology in the mass production so that SOSG technology might be more 
practical. 

3.2 Device fabrication 

There are two common types of Flash memory array architectures: NAND and NOR which 
follow to the logical form of the cell configuration. The cell layout of URAM is the same as 
that of NOR type Flash because the drain voltage should be applied to each memory cell to 
trigger the impact ionization. The cell layout of URAM is shown in Fig. 3-2. The gates of 
each cell are coupled by a row line, and their drains are coupled with column lines. Since the 
individual memory cells are connected in parallel, random access is allowed. NOR 
architecture generally has one contact per two neighboring cells by sharing the source 
contact, thereby reducing the chip area. Some types of URAM, however, cannot use the 
shared source contact. While the shared source is possible for SOI and SOSC, SONW and 
SOSG require each source contact for all cells because each cell should be isolated by the STI 
oxide. The cross-sectional schematic along the bit-line direction in Fig. 3-3 shows that the 
source can be shared in SOI and SOSC. If the source is shared at SONW and SOSG, 
however, the lateral migration of excess holes can disturb the body charges of the 
neighboured cell. In other words, every cell should be isolated by the STI oxide and have 
their own source line. As a result, the layout efficiency of SOI and SOSC is better than that of 
SONW and SOSG. 




m 

(a) (b) 

Fig. 3-2. Two types of URAM configuration, (a) Shared source line uses one contact for two 
cells and (b) divided source lines require individual contact for each cell. 
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Fig. 3-3. Schematics of the cells along the word line direction. Whereas the SOI and SOSC 
use the shored bit line, the SONW and SOSG should utilize the divided source line. 

The schematic of the process flow is shown in Fig. 3-4 (Han et si., 2009). Except the SOI 
substrate, SOSC, SONW, and SOSG utilize the bulk silicon wafer. Whereas SOI itself 
provides the intrinsic floating body, bulk substrates require the energy band engineering to 
form the extrinsic floating body. The n + deep ion implantation is carried out for the SONW, 
Sii-yCy/Si is epitaxially grown for SOSC, and Sii_ x Ge x /Si is epitaxially grown for SOSG. 
After the various types of the templates are prepared, the subsequent processes are similar. 
A photolithography process with a 0.18|im design rule is applied for channel definition. The 
photoresist is then trimmed down to a line width of 30nm by plasma ashing. The silicon is 
etched by reactive ion etching (RIE), resulting in the a 30nm width fin shaped channel. High 
density plasma (HDP) oxide is deposited and planarized by chemical mechanical polishing 
(CMP) and partially recessed by diluted HF until the upper part of the fin is exposed. The 
remaining lower part of the fin is covered by the isolation STI oxide, and the exposed upper 
part of the fin becomes the active area. The gate dielectric stack, tunnelling 
oxide/ nitride/ control oxide, is formed, and in-situ doped n + polysilicon for the gate is 
sequentially deposited. After the gate patterning, source/ drain implantation and activation 
are carried out followed by forming gas annealing. 
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Fig. 3-4. Process flow of the URAM. After the quantum substrates for the floating body are 
prepared, the subsequent process flow is identical. Whereas SOI itself provides the intrinsic 
floating body, bulk substrates are hindered by the energy band engineering to form the 
extrinsic floating body. The n + deep ion implantation is carried out for the SONW, Sii- y C y /Si 
is epitaxially grown for SOSC, and Sii- x Ge x /Si is epitaxially grown for SOSG. 
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Fig. 3-5. Tilted view of the SOI URAM (upper), and cross-sectional view of four types of 
URAM (lower). 

Tilted scanning electron microscopy (SEM) image and cross-sectional transmission electron 
microscopy (TEM) images of the fabricated device on various quantum templates are shown 
in Fig. 3-5. Table 3-1 summarizes the geometric dimensions. 
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Table 3-1. Summary of geometric dimensions for four types of URAM. 
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4. Device performance 

4.1 Direct Current (DC) characteristics of URAM 

Once the devices are fabricated, the fundamental properties should be investigated to find 
out whether the current-voltage characteristics are acceptable. The drain current (Id) versus 
gate voltage (Vg), i.e. transfer characteristics, is commonly monitored, providing important 
parameters such as threshold voltage (Vt), on-current (I n), off-current (I ff), subthreshold 
slope (SS), drain induced barrier lowering (DIBL), etc. Fig. 4-1 shows the transfer plot for 
URAM. The SOI exhibits the steepest SS due to the well known fact that the depletion 
capacitance is the smallest at SOI. Whereas Vx of the SOI, SOSG, and SOSG are similar, that 
of the SON is larger than others because of the high body doping concentration. In order to 
avoid an electrical short between n + source/ drain and n + well, body doping concentration 
should, reluctantly, be high. Thus, driving current degrades due to the mobility degradation 
stemming from impurity scattering. It is found that the other parameters are superior to the 
counter devices (planar single-gate structure), which is attributed to the three-dimensional 
device structure. The device parameters are summarized in Table 4-1. 
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Fig. 4-1. Drain current versus gate voltage characteristics for various types of URAM. The 
superior device properties are attributed to the three dimensional device structure. 





SOI 


SOSC 


SOSG 


SONW 


Threshold voltage 


0.21V 


0.29V 


0.33V 


0.5V 


Subthreshold slope 


85mV7dec 


93mV/dec 


95mV/dec 


101mV/dec 


DIBL 


32mV/V 


110mV/dec 


115mV/dec 


151mV/dec 


On-current 


9.3x10- 11 A 


2.4x10- 10 A 


1.3x10- 10 A 


5.1x10- 11 A 


Off -current 


1.0x10- 5 A 


8.2x10- 6 A 


3.7x1 0- 6 A 


7.6x1 0- 6 A 



Table 4-1. Summary of the device performances. The high threshold voltage in SONW is 
attributed to the high body doping concentration. 
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The simplest method to verify whether the impact ionization generates excess holes that will 
be stored inside the body is to examine the kink point in the drain current (Id) versus drain 
voltage (Vd), i.e. output characteristics. As the drain voltage increases, the impact ionization 
process begins to occur beyond a certain drain voltage, generating pairs of electrons and 
holes. While the generated electrons flow out toward the drain terminal, the generated holes 
are repulsed to the body by positive drain voltage. In bulk substrates, generally, these holes 
are collected by a grounded substrate terminal, appearing as a form of substrate current. If 
the body is electrically floated, however, the holes are accumulated, contributing as an extra 
quasi-gate. Therefore, the accumulation of excess holes causes current increase at certain 
drain voltage, and anomalous output characteristics can be found. Fig. 4-2 shows the output 
characteristics for URAM. The kink points assure that the excess holes are effectively 
accumulated, even at the bulk substrates, which are quantum mechanically engineered. 
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Fig. 4-2. Drain current versus drain voltage characteristics for various types of URAM. As 
the drain voltage increases, the excess holes generated by the impact ionization are stored in 
the floating body, resulting in a kink in the saturation region. 



4.2 Flash memory characteristics 

Flash memory performance is normally evaluated in terms of four aspects: program speed, 
erase speed, data retention time, and endurance cycles. For nonvolatile memory application, 
the cells should satisfy the 10-years data retention requirement with 10 7 program/ erase 
endurance cycles. The ability to store and recover data after ten years is called 'retention', 
and the ability to withstand repeated program/ erase cycles is called 'endurance'. The 
program/ erase can be carried out by Fowler-Nordheim (FN) tunneling or hot-carrier 
injection (HCI). In this study, the program/erase is enabled by FN tunneling. Fig. 4-3 shows 
representative program/ erase transient characteristics. The characteristics are obtained from 
the SOI, representatively. As the program/ erase voltages are increased, a higher threshold 
voltage shift is achieved. In addition, as the program/ erase time is increased, the threshold 
voltage is at first shifted and then saturated after a certain time. Normally, the erase speed is 
slower than the program speed because the tunneling efficiency of holes is lower than that 
of electrons due to the high effective mass and energy barrier height in the valence band 
side. Thus, in a memory array, erase operation is normally carried out by block erasing to 
improve the erasing throughput. Here, a Vt window of 3.3V is achieved at the program of 
11V with 80|isec and the erase of -11 V with 10msec. 
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Fig. 4-3. Program/ erase transient characteristics of Flash memory mode at SOI URAM. (a) 
program transient and (b) erase transient characteristics. (Han et al., 2007) 

The retention and endurance are crucial factors that determine the reliability of the Flash. 
Fig. 4-4 shows that the 10-years retention and 10 7 cycles are guaranteed with a 1.9V 
detection window. Table 4-2 summarizes the reliability factors for various templates. No 
memory retention and endurance failure are obtained as long as the detectable threshold 
voltage window is greater than IV. The endurance failure for SOSG is speculated to be 
caused not by a structure related failure, but by a process induced failure of O/N/O. 
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Fig. 4-4. (a) Data retention and (b) endurance characteristics of Flash mode at SOI URAM. 

(Han et al, 2007) 





SOI 


SOSC 


SOSG 


SON 


AV T (11V/80usec, 
-11V/ 10msec) 


3.3V 


3.6V 


3.0V 


3.6V 


AV T (after 10 years) 


1.9V 


3.5V 


2.6V 


1.6V 


AV T (after 10 7 cycles) 


2.7V 


3.2V 


Fail 


2.7V 



Table 4-2. Summary of program/erase efficiency and reliability for various types of URAM. 
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4.3 Capacitorless 1T-DRAM characteristics 

A capacitorless IT-DRAM mode is characterized by the customized system. Fig. 4-5 shows 
the measurement system. The computer controls the pulse generator (Agilent 81110A), 
oscilloscope (Agilent 54542C), and current amplifier (Keithley 428). The pulse generator 
applies voltage patterns to the device. The source current is amplified by the current 
amplifier, changed into a form of voltage, and monitored by the oscilloscope. For low noise 
measurement, a low noise cable with length of 50cm is used. The device is tested under the 
probe station (Cascade R4840). All operations utilize the gate voltage of IV, which is not an 
indispensable condition, but for monotone waveform to simplify the sensing circuit 
circuitry. 
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Fig. 4-5. Customized measurement system and the operational pulse waveform. 

In order to minimize the leakage paths in the three-dimensional FinFET, the fin width 
should be as narrow as possible. This means that a fully depleted body is desirable in terms 
of scalability. The capacitorless IT-DRAM, however, requires a partially depleted body, i.e. 
wider fin width, to store the detectable amount of holes. To compromise the scalability and 
the performance functionality, the fin is divided into two regions (Han et al., 2009). Fig. 4-6 
compares the conventional FinFET and the proposed one as counterpart structures. Whereas 
the fin is fully surrounded by the gate at the conventional FinFET, the fin of the proposed 
one is partially covered. The essence of the proposed one is that the hole accumulation 
region is spatially separated from the inverted channel. The upper part covered by the gate, 
which is fully depleted, provides a conduction path. The lower part covered by STI oxide, 
which is partially depleted, serves for a hole storage. Therefore, scalability and performance 
functionality (the floating body effect) are attained at the same time. 
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Fig. 4-6. Comparative images of (a) the conventional fully-depleted FinFET SONOS and (b) the 
proposed half fully-depleted and half partially-depleted FinFET SONOS. The contours of the 
body potential supported by simulation assure that the existence of a partially depleted region 
to accommodate more holes is attractive for proper IT-DRAM operation. Consequently, the 
proposed FinFET is superior to the conventional FinFET. (Han et al., 2007) 
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Fig. 4-7 shows the program/erase characteristics of the capacitorless IT-DRAM. As 
mentioned in Section 2.3, the program/ erase voltage should be optimized in order to avoid 
undesired charge trapping in the O/N/O layer. The program uses Vg,pgm=1V and 
Vd,pgm = 1-5V, the erase uses Vg,ers = 1V and Vd,ers =_ 1V, and the read voltages are 
Vg,read=1V and Vd,read=0.4V. Before utilizing the capacitorless IT-DRAM mode, the initial 
Vt is set to 0.2 V. The data states are clearly distinguished with a 7uA sensing window after 
80msec data retention, whereas the conventional device exhibits a smaller sensing window. 
This is attributed to the presence of increased excess hole accumulation as shown in Fig. 4-6. 
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Fig. 4-7. Source current for the capacitorless IT-DRAM mode. The two data states are clearly 
identified because more holes are accumulated in the partially-depleted URAM. However, 
the source current difference is relatively small in the conventional fully-depleted URAM. 
(Han et al., 2007) 

In the SONW substrate, the buried n-type well is embedded inside a p-type bulk substrate. 
The junction of the p-type body and the n-type well forms the pn built-in potential barrier, 
thus the excess holes can be retained inside the p-type body region. In order to prove that 
the excess holes can really be confined, the simulated contours of the hole concentration 
after the program are shown in Fig. 4-8. In conventional bulk substrates, excess holes are 
generally collected by the grounded substrate. At the SONW substrate, the holes confront 
the n-well junction barrier, and the holes are thus accumulated inside the body. 
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Fig. 4-8. Simulated contours of the hole concentration biased at hold condition after impact 
ionization, (a) convention bulk FinFET, and (b) SONW URAM. In contrast to the 
conventional case, SONW URAM stores the excess holes in the body region. (Han et al., 
2008a) 
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Fig. 4-9 shows the program/erase characteristics. The important feature in the bulk 
substrates is that the barrier height can be modulated by the substrate voltage. In other 
words, the ability to retain holes can be improved by proper substrate voltage. An applying 
a weak positive voltage and enlarging the hole barrier height can enhance the sensing 
window. The sensing window with retention time is increased from 4uA with 8msec to 7uA 
with 30msec as the substrate voltage is increased from 0V to 0.3V. In the case of strong 
positive voltage, however, the capacitorless IT-DRAM cannot work because the forward 
biased source/ drain to the body junction diode is inevitably turned on. 
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Fig. 4-9. Source current for capacitorless IT-DRAM of SONW. The sensing window is 
widened at a small positive substrate voltage. (Han et al., 2008a) 

Since the SONW substrate needs the deep implantation process, it is hard to define an 
accurate and abrupt quantum engineered junction profile. The SOSC would be preferred as 
its energy band is determined by epitaxial growth and the mole fraction of C in Sii- y C y . In 
addition, whereas the buried n-well should be located far from the source/ drain junction in 
order to avoid the electrical short, the band offset interface of Si/Sii- y C y can be at closer to 
the source/ drain so that the influence of stored holes on the inverted channel becomes 
stronger. Therefore, SOSC gives a rise to improvements in performance. Fig. 4-10 shows the 
program/ erase characteristics. The sensing window of H|iA with a retention time of 50msec 
at a substrate voltage of 0.3V is wider compared to that of SONW (Han et al., 2008). Also, 
the sensing window is wider at Vsub=0.3V than at Vsub=0V as predicted. 



o 

CO 



28 
^24 

-y° 

£ 16 

g 12 y tfcrtp** 

<D 

P f 



^**¥+t- 




Al=7nA 



Al s =11nA 
MM 






2 3 4 

Time, t (msec) 



Fig. 4-10. Source current for the capacitorless IT-DRAM of SOSC. The small positive 
substrate voltage raises the sensing current window. (Han et al., 2008a) 
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The above three substrates, SOI, SONW, and SOSC, showed a quantum barrier, i.e. the 
energy band of the floating body is above that of the quantum engineered substrate. In 
contrast, SOSG is the quantum well structure because the energy band of the floating body 
is below that of the quantum engineered substrate. The quantum barrier type substrates use 
their bodies for the conduction path as well as the storage region, simultaneously. This 
condition can cause the excess holes to easily disappear by recombination with the inverted 
electrons, leading to degradation in the data retention time. However, the quantum well can 
separate the excess holes and conduction electrons; thus, the stored charge loss via the 
recombination process with an inverted electron is expected to be minimized, and improved 
performance is predicted. The Si/Sii- x Ge x /Si, SOSG, forms the potential well structure 
because the valence band energy of Sii- x Ge x is higher than that of Si, as shown in Fig. 4-7. In 
SOSG, the top Si serves as the conduction channel, and the centered Sii- x Ge x is devoted to 
the hole storage region. The major advantage compared to SOSC is that, whereas the solid 
solubility of carbon in silicon is limited to 5%, the germanium content can be adjusted from 
0% to 100%, which allows wide band offset modulation by changing stoichiometry of Sii- 
x Ge x . Therefore, the SOSG can provide more degrees of freedom in the energy band design 
because the depth of the potential well is favorably determined by the germanium content. 
The impact of germanium on the band offset has been theoretically reported, and it turns 
out that the valence band offset between Si and Sii_ x Ge x is linearly increased with content x. 
The simulated distribution of excess holes after programming and maximum hole 
concentration for various content x are shown in Fig. 4-11. The holes are found to be 
preferentially accumulated in the Sii_ x Ge x layer. The hole concentration is exponentially 
increased as the valence band offset is increased. However, the hole concentration starts to 
saturate after a band offset of 0.24eV, which corresponds to germanium content of x=0.4. 
This means that the usage of Ge higher than x=0.4 will be ineffective in terms of the ability 
to store holes. In other word, a very deep potential well is not always necessary for higher 
performance (Han et al., 2008b). 
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Fig. 4-11. (a) Simulation results of hole concentration biased at hold condition after impact 
ionization. The stored excess holes are found in a Sii_ x Ge x potential well, (b) the valence 
band offset as a function of germanium content and resultant hole concentration, and (c) the 
energy band offset in the SOSG structure. The amount of stored holes is exponentially 
increased as the valence band offset is increased, but the hole concentration starts to saturate 
after a band offset of 0.24eV. (Han et al, 2008b) 
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Fig. 4-12 shows the program/ erase characteristics for two germanium contents, x=0.3 and 
x=0.5. Despite of the larger sensing window at x=0.5, the retention is found to be inferior to 
that at x=0.3 due to the fact that the higher defect density caused by an atomic lattice 
mismatch at x=0.5 induces faster recombination during read operation. In addition to the 
retention degradation at the programmed state, a deeper potential well also is found to 
degrade the retention at the erased state. The reason is speculated to be that the holes are 
easily diffused into the potential well from the neighbored p-type silicon layers, as 
illustrated in Fig. 4-13. As a result of the trade off between sensing current window and 
retention time, the optimized stoichiometry of Sii_ x Ge x is x=0.3. The retention time (the order 
of microseconds) appears to be insufficient to practical application, however, refinement of 
the epitaxial process and geometric optimization of the 3-D structure will enhance the 
performance. 
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Fig. 4-12. Capacitorless IT-DRAM characteristics for different germanium content in Sii_ x Ge x 
of SOSG. The higher x exhibits a wider sensing window at the beginning of the sensing, but 
also faster charge loss. (Han et al., 2008b) 
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Fig. 4-13. (a) Transmission electron microscopy images of x=0.4 and 0.3 and (b) schematics 
for retention degradation mechanisms. The high germanium content induces a high lattice 
mismatch because the lattice constant of germanium is larger than that of silicon. The 
defects originating from the lattice mismatch reduce the data retention at the programmed 
state via charge recombination, and a deeper potential well degrades the data retention at 
the erased state due to hole-to-hole repulsion and its repellent diffusion mechanism. (Han et 
al, 2008b) 
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Table 4-3 summarizes the features and performances of four types of URAM. The SOI 
substrate exhibits the fastest write speed, the widest sensing window, and the longest 
retention among the four substrates. Among the bulk types, the SOSC substrate displays 
superior performance. 





SOI 


SONW 


SOSC 


SOSG 


substrate 


SOI 


bulk 


bulk 


bulk 


energy 
band type 


valence band 
barrier 


built-in 
potential 


valence band 
barrier 


valence band 
well 


energy band 
abruptness 


abrupt 


gradual 


abrupt 


abrupt 


band offset / 
built-in potential 


4.8 eV 


0.9 eV 


0.18 eV 


0.05~0.32eV 


channel and 
storage region 


share 


separate 


share 


separate 


program speed 


6 nsec 


20 nsec 


20 nsec 


50 nsec 


sensing window 


-20 nA 


~7^A 


-11 nA 


~8^A 


retention time 


~ 80 msec 


~ 30 msec 


~ 50 msec 


~ 600 usee 



Table 4-3. Summary of the features and performance of various URAMs. All data were 
measured at 300K. 



5. Soft-programming issue and solutions 

5.1 Soft-programming issue 

URAM can be realized by combining the O/N/O gate dielectric to store electrons and the 
floating body to capture holes in a single transistor. Unfortunately, impact ionization for the 
program of the capacitorless IT-DRAM can adversely affect the stored charges in the 
O/N/O layer. This gives a rise to an undesired threshold voltage shift, which is called 'soft- 
programming'. The strong impact ionization condition provides faster program speed, a 
wider sensing window, and longer retention time, but this simultaneously increases the hot- 
electron injection into O/N/O, leading to instability as a result of the disturbance between 
the Flash and capacitorless IT-DRAM modes. Thus, the program condition of capacitorless 
IT-DRAM has reluctantly been bounded in order not to disturb the Flash memory states. 
This is becoming an increasingly important concern. 

In order to clarify the soft-programming, Fig. 5-1 shows the capacitorless IT-DRAM 
performance after 10 5 cyclic operations. When the drain voltage for a program is 1.8V, a 
sensing current window of 6uA is sustained, which means the interference is negligible. To 
improve the performance, when the drain voltage is increased to 2.2V, hot-electron injection 
is unpropitiously caused. This causes a gradual charge trapping into O/N/O, and the 
resultant sensing window is decreased. Therefore, a soft-program poses a constraint on the 
maximum program voltage. In order to overcome this issue, soft-programming immune 
device structures and operational methods are suggested in the following subsections. 
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Fig. 5-1. Program/ erase characteristics by the impact ionization method for different 
program voltages. The sensing window is reduced at a high drain voltage as the stress 
cycles increase due to hot electron injection into the nitride layer. (Han et al., 2009a) 



5.2 Soft-programming immune structure: gate-to-S/D nonoverlap structure 

The soft-program tends to occur as the impact ionization process is triggered under the gate. 
Thus, if the impact ionization region is steered out of the O/N/O layer, hot electron 
injection can be mitigated. The impact ionization process occurs at the region with the 
highest electric field, i.e., drain end. Thus, a gate-to-source/ drain nonoverlap creates an 
impact ionization region located outside of the gate. Even though the impact ionization 
triggers a hot-electron injection, the charge trapping is alleviated since there are no trap 
sites. Thus, the constraint of program bias is relieved. For this purpose, junction nonoverlap 
structure is fabricated and compared to the conventional overlap structure. Fig. 5-2 shows 
the fabricated device images. The nonoverlap length is 20nm. 




(a) (b) 

Fig. 5-2. (a) Schematics of gate-to-source/ drain overlap and nonoverlap structure and (b) 
transmission electron microscopy image of the gate-to-source/ drain nonoverlap devices. 
The body thickness is 50nm, the gate length is HOnm, and the nonoverlap length is 20nm. 
(Han et al, 2009a) 

Fig. 5-3 shows the memory characteristics for both structures. Even though the nonoverlap 
device may suffer from degradation in the impact ionization efficiency, the sensing current 
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window of the nonoverlap device is found to be wider than that of an overlap one (Fig. 5-3a) 
because the nonoverlap reduces the junction leakage and recombination rate. In addition, the 
effective volume of the floating body is extended by the amount of nonoverlap. As a result, 
reduced impact ionization efficiency can be compromised (Song et. al., 2008). In the Flash 
memory characteristics shown in Fig. 5-3b, a threshold voltage window of 4.3V is achieved. 
The threshold voltage for a fresh device is higher in a nonoverlap than at an overlap device, 
and the threshold voltage window of a nonoverlap structure is narrower than that of an 
overlap structure. Despite of the degradation in the threshold voltage window for flash 
memory, the window of 4.3V is acceptable to identify the data states (Han et al., 2009). 
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Fig. 5-3. (a) Capacitorless IT-DRAM and (b) Flash memory characteristics. The nonoverlap 
structure shows a wider sensing current window in capacitorless IT-DRAM mode, but 
narrow threshold voltage window in Flash mode. (Han et al., 2009a) 

In order to evaluate the soft-programming immunity, a stress test is carried out. While the 
program voltage is applied, the threshold voltage shift is periodically monitored during the 
operation cycles. As shown in Fig. 5-4, whereas the overlap structure shows a threshold 
voltage shift of 0.2V, the nonoverlap device exhibits distinctively superior immunity against 
the soft-program to the overlap device. 
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Fig. 5-4. The threshold voltage shift monitored during the cyclic operations. The nonoverlap 
structure shows superior immunity against the soft-program. (Han et al., 2009a) 
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It is worthwhile to note that the major weakness of the nonoverlap junction structure is that 
the parasitic voltage drops via series resistance at the nonoverlap region reduces the impact 
ionization efficiency. This drawback can be countervailed by increasing the dielectric 
constant of the gate offset spacer. The high dielectric constant of the spacer can increase the 
effect of the gate fringing field to the nonoverlap region, which is expected to boost the 
impact ionization rate (Ma et. al., 2007). Therefore, the abundance of excess holes can further 
improve the current drivability and recover performance of the capacitorless IT-DRAM 
against the sacrificed impact ionization efficiency. 

5.3 Soft-programming immune operation: gate-induced-drain-leakage program 

To date, impact ionization was commonly used to create excess holes in the body. However, 
in place of the impact ionization, there is another method to generate excess holes; the gate- 
induced-drain-leakage (GIDL). A device biased on the GIDL condition, i.e., negative gate 
and positive drain voltage, creates excess holes in the body by band-to-band tunneling. The 
impact ionization program significantly wastes power since it is triggered by high drain 
current. However, GIDL current does not require such drain current; thus, low power 
operation is feasible. If the O/N/O is in the erase saturation state prior to activating the 
capacitorless IT-DRAM mode, the hole injection into O/N/O is effectively restricted. In 
addition, the hole injection is even suppressed because the effective mass and energy band 
barrier of the hole in the valence band side are high. Fig. 5-5a shows the program/ erase 
pulse waveform of the GIDL program method and resultant sensing current. The current 
window of 12|iA with 50msec data retention facilitates the data sensing. In order to verify 
the immunity against the soft-program, the stress test is carried out by the impact ionization 
and GDIL program methods. The amount of trapped charges is evaluated by monitoring the 
shift of the threshold voltage. Fig. 5-5b shows the impact of cyclic capacitorless IT-DRAM 
on the threshold voltage shift. Whereas the impact ionization condition induces the charge 
trapping and results in a threshold voltage shift, the GIDL method does not. Thus, the GIDL 
method is the effective tool to achieve a soft-program immune operation (Han et al, 2009). 
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Fig. 5-5. (a) Program/erase characteristic of GIDL program method and (b) threshold 
voltage shift versus cyclic stress time. The GIDL program method does not shift the 
threshold voltage, while the impact ionization program does. (Han et al., 2009b) 
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It is worthwhile to note that the GIDL program method in URAM may be inefficient in 
terms of the generation efficiency of holes. As the Flash memory utilizes O/N/O gate 
dielectric, such a thick gate dielectric hampers to achievement of sufficient band bending for 
band-to-band tunneling. Thus, a programming time of lOOnsec was used, which would be 
too long to apply for the embedded system. Since the thickness of O/N/O is no longer 
scalable to sustain acceptable nonvolatility, a higher gate voltage would be required, but it 
also poses power issues. In order to overcome this drawback, a p + polysilicon gate on the p- 
type body can be used. Since the flat band voltage difference between p + polysilicon and n + 
drain is higher than that of n + polysilicon and n + drain, a higher GIDL current is induced at 
a given gate voltage (Lindert et. al, 1996). Therefore, the implementation of the p + 
polysilicon gate can yield improved memory characteristics in the GIDL method. 



5.4 High performance and soft-programming immune operation: parasitic BJT read 

In the first prototype of URAM, the impact ionization program condition caused a soft- 
program issue. Next, despite the suppression of the soft-program, the GIDL program tends 
to sacrifice program efficiency. In summary, both methods have their distinctive strengths as 
well as weaknesses, simultaneously. In this section, a third method is introduced for 
improved performance with soft-program free operation. It is important to note that the 
floating body MOSFET contains a parasitic lateral bipolar junction transistor (BJT) 
composed of n + source, p-type body, and n + drain, which correspond to an emitter, base, 
and collector, respectively. As the p-type body is floated, the BJT with the floating base 
cannot be activated in the normal MOSFET operational conditions. However, if the high 
voltage is applied to the drain, the hole injection to the floating base can turn on the parasitic 
BJT, and the drain current is maintained even though the MOSFET is supposed to be turned 
off (Chen et. al, 1988). Fig. 5-6 shows the double-sweep transfer characteristics. At a low 
drain voltage, the normal MOSFET transfer curved is shown, and there is no hysteresis. At 
high drain voltage, however, the subthreshold slope approaches OmV/ dec at the time of the 
parasitic BJT activation, and a hysteresis loop is generated. Thus, even at the given read 
voltage, bistable current-voltage characteristics can be utilized as a single memory 
transistor, as indicated in the Fig. 5-6 (Okhonin et. al, 2007). 
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Fig. 5-6. Double sweep drain current versus the gate voltage characteristics at the SOI 
URAM. At V D =1.8V, the device shows normal MOSFET transfer characteristics. At V D =2.2V, 
the parasitic BJT alternatively begins to work, and the hysteresis loop is created. 
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Fig. 5-7 comparatively shows the capacitorless IT-DRAM characteristics with the previous 
two program methods and the BJT read method. A pulse width of 5nsec is applied for 
program and erase in the BJT mode. The programming is carried by impact ionization or 
GIDL, and the erase is fulfilled by forward junction current. The difference lies in the read 
condition. The previous methods used low drain voltage, typically Vd<0.6V, in order not to 
disturb the body charged state, resulting in a small sensing current. In addition the sensing 
current window was gradually narrowed by the generation and recombination processes. In 
contrast, the parasitic BJT read uses a high drain voltage at least Vd>2V to activate bipolar 
action. While the negative gate voltage turns off the MOSFET, the parasitic BJT can either be 
activated or deactivated according to an excessive number of holes or lack of holes. When 
excess holes exist in the body, a parasitic BJT is activated in which the current corresponds 
to the point A in Fig. 5-6. On the other hand, when excess holes are eliminated, the parasitic 
BJT is deactivated, thereby causing the current not to flow, which corresponds to point B in 
Fig. 5-6. In particular, once the parasitic BJT is activated, the high BJT current is latched 
despite the MOSFET being in off state because the hole is continuously supplied as long as 
the read voltage is applied. Therefore, the BJT read method is considered to be completely 
non-destructive, and the sensing current window is high enough that a sense amplifier may 
not be necessary to identify the data. 
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Fig. 5-7. Comparison of the capacitorless IT-DRAM characteristics with various methods. In 
the conventional read method, the sensing current window is gradually narrowed with the 
read time. In the BJT read method, the source current remains constant because the stored 
data at BJT read condition is latched. 

It is important to note that the read and program operations correspond to the hot-hole 
injection conditions, which can cause a threshold voltage shift during cyclic capacitorless IT- 
DRAM operations. This situation seems similar to the soft-program, but it turns out that the 
soft-program in BJT is negligible, as shown in Fig. 5-8. If the nitride traps are saturated with 
holes that can be carried out by an initialization step before the capacitorless IT-DRAM 
mode, there are no additional threshold voltage shifts because there are no extra available 
trap sites in the nitride. According to the stress test data, the threshold voltage shift is found 
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to be negligible, resulting in stable operation. The soft-program free scheme can exclude the 
operation loop of the verification of the soft-programming and re-initialization that is 
supposed to be required in the conventional methods, as shown in Fig. 5-9. The elimination 
of the redundant loop can greatly conserve the integrity of the gate oxide, which can 
otherwise be degraded by repeated initialization processes. 
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Fig. 5-8. Threshold voltage and sensing current window versus the operation cycles. The 
threshold voltage shift and sensing current window degradation are found to be negligible, 
which guarantees very stable URAM operation without soft-programming. 
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Fig. 5-9. Operational sequence for URAM. (a) The conventional read method and (b) the 
parasitic BJT read method. In the conventional read method, the verification and re- 
initialization loop is necessary due to the soft-program issue. In contrast, the parasitic read 
method excludes the redundant loop because the interference between the two modes is 
eliminated. 
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6. Conclusions 

In this chapter, as we confront challenges of current memory technology and as the design 
rule deviates from the historical scaling paradigm, a novel memory scheme is proposed to 
continue the roadmap beyond the end point of silicon based memory. Over the scaling to 
multi-bit era, the multi-functional paradigm is proposed. Whereas the conventional fusion 
memory pursues high-cost multi-chip-package technology, multi-function is realized in a 
single memory transistor. The functions of nonvolatile Flash memory and high-speed 
DRAM are co-integrated, and this memory is named Unified-RAM or URAM. The 
combination of oxide/ nitride/ oxide gate dielectric and the floating body structure provide 
two functions in a single memory cell. In addition, the inherent operational bias domain for 
two functions allows independent function depending on the end user's demand. The 
various floating body substrates designed with consideration of the quantum mechanics 
were proposed in order to confine the excess hole to operate the capacitorless IT-DRAM. In 
addition to the conventional silicon on insulator (SOI) substrate, three bulk type floating 
body substrates were developed. The silicon on n-well (SONW) formed by the deep ion 
implantation and the silicon on Sii- y C y (SOSC) formed by epitaxial growth were presented 
for the potential barrier type approach. Furthermore, the silicon on Sii_ x Ge x (SOSG) was 
developed for the potential well type substrate. Even though the performance of bulk might 
be inferior to that of SOI, the bulk can be still useful in terms of cost-effective manufacturing 
and heat dissipation with a moderate sensing window. After the soft-program issue was 
certificated, the parasitic bipolar junction transistor (BJT) read method was newly proposed 
for powerful performance with soft-programming immunity. 

As URAM is implemented by using standard semiconductor design and fabrication facility, 
new products can be manufactured quickly, reducing development time and investment 
cost. The beauty of URAM lies in the fact that it does not require exotic semiconductor 
materials, oddly structured parts, exploratory insulator, or an extra photolithography step. 
URAM is considered to be the next generation for advanced memory technology, which will 
open a new paradigm shift, and it will be a viable successor to the future embedded 
memory. 
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1. Introduction 



Analog signal amplification in discrete-time system can be performed by switched-capacitor 
amplifiers (Martin et al., 1987). Switched-capacitor amplifier has been used in the design of 
digital-to-analog converter (Yang & Martin, 1989). The schematic for the switched-capacitor 
amplifier is shown in Figure 1. 
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Fig. 1. A differential- to-single-ended CMOS switched-capacitor amplifier. Depending on the 
input-stage clock signals, the amplifier can be either noninverting (as shown) or inverting 
(input-stage clocks shown in parentheses). 
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Assuming an infinite op amp gain, the output voltage at end of <j> 2 is given by 

V mt (nT) = ^V in (nT-l), (1) 

irrespective of the op amp offset voltage. If the clock waveforms shown in parentheses are 
used, then an inverting function is realized, and 

V 0Ut (nT) = -^V in (nT), (2) 

again independent of the op amp input offset voltage. During the reset phase ( (^ ), C 3 is 
connected in feedback around the op amp which causes the output change only by the op 
amp input offset voltage. The switches are realized as CMOS transmission gate. For low 
supply voltages, a conductance gap begins to appear around the middle of the supply range 
(Crols & Steyaert, 1994). This means that under low -voltage operation, this configuration no 
longer works. Existing solutions of low -voltage operation of switched-capacitor circuits 
include using low threshold voltage process (Matsuya & Yamada, 1994), switched-opamp 
technique (Baschirotto & Castello, 1997; Cheung et al., 2001; Cheung et al., 2002; Cheung et 
al., 2003; Crols & Steyaert, 1994; Peluso et al., 1997; Peluso et al, 1998; Sauerbrey et al, 2002; 
Waltari & Halonen, 2001; Wu et al., 2007), opamp-reset switching technique (Chang, & 
Moon, 2003; Keskin et al., 2002; Wang &. Embabi, 2003), voltage multiplier (charge pump) 
technique (Nicollini et al., 1996; Rombouts et al., 2001), clock multiplier (clock booster) 
technique (Au & Leung, 1997; Rabii & Wooley, 1997), and bootstrapping switch technique 
(Abo & Gray, 1999; Dessouky & Kaiser, 2001; Park et al., 2004). First, the use of low- 
threshold transistors involves special and high-cost technology (Matsuya & Yamada, 1994). 
The switched-opamp technique (Baschirotto & Castello, 1997; Cheung et al., 2001; Cheung et 
al., 2002; Cheung et al., 2003; Crols & Steyaert, 1994; Peluso et al., 1997; Peluso et al., 1998; 
Sauerbrey et al., 2002; Waltari & Halonen, 2001; Wu et al., 2007) and opamp-reset switching 
technique (Chang, & Moon, 2003; Keskin et al, 2002; Wang &. Embabi, 2003) can only be 
applicable to filters, delta-sigma modulators, and pipelined analog-to-digital converters. The 
main limitations of voltage multiplier (charge pump) technique (Nicollini et al., 1996; 
Rombouts et al., 2001) regards: the gate-oxide breakdown reliability, the need to supply a dc 
current to the op amps from the multiplied supply (this necessitates the use of an external 
capacitor, with additional cost), and the conversion efficiency of the charge pump (which is 
lower than 100%). The clock multiplier (clock booster) technique (Au & Leung, 1997; Rabii & 
Wooley, 1997) suffers from the technology limitation associated with the gate oxide 
breakdown. Device reliability can be assured in the bootstrapped switch technique (Abo & 
Gray, 1999; Dessouky & Kaiser, 2001; Park et al., 2004), owing to keeping the terminal-to- 
terminal voltages of the MOSFET devices within the rated operating supply voltage of the 
technology. The bootstrapped switch provides a small, nearly constant input resistance. The 
switch linearity is also improved, and signal-dependent charge injections is reduced. 
To improve the overall linearity, minimize the effect of common-mode interference and 
noise, the fully differential approach has obtained wider acceptance for accurate and/ or 
high-speed signal processing. The switched-capacitor amplifier in (Martin et al., 1987) is a 
differential-to-single-ended design. A fully differential switched-capacitor amplifier using 
series compensation MOSFET capacitors has been presented in (Yoshizawa et al., 1999). 
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However its operating voltage is +2.5-V. Consequently there is an increasing demand to 
extend these improvements to this circuit. 

This chapter describes the design of two IV fully differential CMOS switched-capacitor 
amplifiers in a standard CMOS technology using improved bootstrapped switches. In 
section 2, the circuit realization of these two switched-capacitor amplifiers is addressed. In 
section 3 the circuit design of low-voltage building blocks is described. Experimental results 
are presented in section 4 to support the ideas put forth in paper. Finally conclusion is given. 



2. Circuit Description 
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Fig. 2. First low-voltage fully differential CMOS switched-capacitor amplifier. Depending on 
the input-stage clock signals, the amplifier can be either noninverting (as shown) or 
inverting (input-stage clocks shown in parentheses). 

Figure 2 shows the first low-voltage fully differential CMOS switched-capacitor amplifier 
based on improved bootstrapped switches described in section 3.2, where switches S1-S4 
and ST-S4' are matched improved bootstrapped switch pairs and switches S5-S6 and S5'-S6' 
are NMOS matched switch pairs. In order to minimize the number of improved 
bootstrapped switches, two analog reference voltages are used: V ss at the op amp input 
where a normal NMOS switch can be used to switch the lowest supply voltage, and a 

— — — common-mode voltage at the op amp output and the circuit input to maximize 
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the signal swing. The improved bootstrapped switch is used to switch signals at this voltage 
level. Figure 3 is the single-ended version of Figure 2. 

V ss 



Vcm+Vj, 




Fig. 3. Single-ended version of Fig. 2. 

To see how this circuit operates, consider the inverting circuit during the reset phase ( c/\ ) 
and during valid output phase ( ^ ), as shown in Figure 4. Then based on charge 
conservation principle we can write: 

Q ( Vss + V off - V cm ) + C 2 (V ss + V off - V cm ) 



'CJVss+V 



V cm -v in (nT)] + C 2 [V ss+ V L 



off' 



t(nT)], 



(nT) = --±v m (nT). 



C 



(3) 



It should be noted that the clock waveforms with the primed superscripts change before the 
nonprimed waveforms in order to reduce nonlinearities due to charge injection. 
Another technique to further reduce the number of improved bootstrapped switches is 
shown in Figure 5, where switches SI and S4 and SI' and S4' are matched improved 
bootstrapped switch pairs. Those switches connected to V ss are realized with NMOS 
transistors, while those switches connected to V DD are realized with PMOS transistors. In 
Figure 5 a single reference voltage at V ss is used. However, the signal still varies around 

— — — at the circuit input as well as at the op amp output to preserve the maximum 

swing. The difference between the two reference voltages is compensated by injecting a 
fixed amount of charge at the op amp input using extra capacitor pairs 
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(Baschirotto & Castello, 1997). Figure 6 is the single-ended version of Figure 5. 
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Fig. 4. Single-ended CMOS switched-capacitor amplifier, (a) during reset phase ( fa ), (b) 
during valid output phase ( fa )■ 

To see how this circuit operates, consider the inverting circuit during the reset phase (fa) 
and during valid output phase ( fa ), as shown in Figure 7. 
Then based on charge conservation principle we can write: 

C a (V ss +V 0# -V SS ) + C 2 (V SS +V off -V SS ) + (C M1+ C M2 )(V SS +V off -V DD ) 

= C 1 lV ss +V 0ff -V cm -v m (nT)] + C 2 lV ss+ V 0ff -V cm -v 01lt (nT)] 
+(C M1+ C M2 )(V ss+ V off -V ss ) 



or V 0Uf {nT) = --±v in {nT) 



(4) 
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Fig. 5. Second low-voltage fully differential CMOS switched-capacitor amplifier. Depending 
on the input-stage clock signals, the amplifier can be either noninverting (as shown) or 
inverting (input-stage clocks shown in parentheses). 
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Fig. 6. Single-ended version of Fig. 5. 
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Fig. 7. Single-ended CMOS switched-capacitor amplifier, (a) during reset phase ( c/\ ), 
(b) during valid output phase ( 2 )■ 

3. Low-voltage building blocks 

In this section, the low-voltage circuit building blocks used in the two fully differential 
CMOS switched-capacitor amplifiers are discussed 



3.1 Op Amp 

Figure 8 shows the used op amp. It is based on a fully differential folded-cascode p-type two- 
stage Miller-compensated configuration. The second stage is a common-source amplifier with 
active load which also allows a large output swing. In order to avoid the common-mode 
feedback (CMFB) circuit for the first stage, transistors M51, M52, M61, and M62 are used, 
which is similar to (Waltari & Halonen, 1998). For the second stage, a simple passive switched- 
capacitor CMFB circuit, shown in Figure 9, is used. The improved bootstrapped switches are 
used to connect and disconnect the common-mode sensing capacitor. 
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Fig. 9. Common-mode feedback circuit for the low-voltage op amp. 

3.2 Improved bootstrapped switch 

The improved bootstrapped switch shown in Figure 10 is utilized in the proposed circuit. 
The circuitry is improved version of that presented in (Abo & Gray, 1999). In the circuit 
presented in (Abo & Gray, 1999), the voltage at the drain side of the main switch Mil must 
be always higher than that at the source side at the switching moment to prevent the gate- 
drain voltage from exceeding V DD during the turn-on transient. In order to overcome this 

limitation, an additional transistor M14 has been added on the drain side, such that the 
switch Mil becomes completely symmetrical. This bootstrapping circuit thus allows switch 
operation (transistor Mil) from rail-to-rail while limiting all gate-source/ drain voltages to 
V DD avoiding any oxide overstress. 
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Fig. 10. Improved bootstrapped switch. 

4. Experimental results 

Based on the principles presented earlier, we have designed two 1-V fully differential CMOS 
switched-capacitor amplifiers. These two switched-capacitor amplifiers were operated with 
±0.5-V. The capacitor sizes used were Cj =1.25-pF, C 2 =0.25-pF, and C 3 =0.25-pF, for a 
nominal gain of -5. The circuits of Figure 2 and Figure 5 were fabricated using a TSMC 0.35- 
um double-poly four-metal CMOS technology. Figure 11 and Figure 12 show the 
photomicrographs of Figure 2 and Figure 5, respectively. The chip areas of Figure 2 and 
Figure 5 excluding bonding pads are 414x278-um 2 and 460x330-um 2 , respectively. 




Fig. 11. Photomicrograph of Fig. 2. 
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Fig. 12. Photomicrograph of Fig. 5. 

Two figures of the measured input/ output waveforms for 0.2V peak-to-peak sinusoidal 
differential input signal are shown in Fig. 13 and Fig. 14, respectively. The input signal was 
at 10kHz whereas the clock signal was at 1MHz. It can be seen that the gain is very close to 
the nominal value of -5. 



sopped 




250m Wdiv 



200'.: t 



■ lOOus 



/OP:: 



0:0: 



Fig. 13. Measured differential input and output waveforms of Fig. 2 (f c ik=l-MHz, fu 
sinusoidal differential input voltage=0.2-V pp ). 
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Fig. 14. Measured differential input and output waveforms of Fig. 5 (f c ik=l-MHz, fin=10-kHz, 
sinusoidal differential input voltage=0.2-V pp ) 

Fig. 15 and Fig. 16 show the resulting output spectrum. As shown in Fig. 15 and Fig. 16, the 
even-order harmonics have been largely attenuated by the fully differential topology and 
59dB and 52dB spurious-free dynamic range (SFDR) are exhibited, respectively. The circuits 
of Fig. 2 and Fig. 5 dissipate 206.5uW and 206.6uW, respectively with a IV power supply. 
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Fig. 15. Measured output spectrum of Fig. 2. 
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Fig. 16. Measured output spectrum of Fig. 5. 



5. Conclusion 

Two fully differential CMOS 1-V switched-capacitor amplifiers have been described. Rail-to- 
rail operation of improved bootstrapped switches allows very low voltage robust switched- 
capacitor designs in standard CMOS technologies while avoiding transistor gate oxide 
overstress. The circuits have been fabricated and all aspects of their performance have been 
confirmed. 



6. References 

Abo , A. M. & Gray , P. R. (1999). A 1.5-V, 10-bit, 14.3-MS/s CMOS pipeline analog-to-digital 

converter, IEEE J. Solid-State Circuits, May , vol. 34, pp. 599-606 ,ISSN: 0018-9200. 
Au, S. & Leung, B. H., (1997). A 1.95-V, 0.34-mW, 12-b sigma-delta modulator stabilized by 

local feedback loops, IEEE ]. Solid-State Circuits, March, vol. 32, pp. 321-328, ISSN: 

0018-9200. 
Baschirotto, A. & Castello, R. (1997). A 1-V 1.8-MHz CMOS switched-opamp SC filter with 

rail-to-rail output swing, IEEE ]. Solid-State Circuits, December, vol. 32, pp. 1979- 

1986, ISSN: 0018-9200. 
Chang, D. Y. & Moon, U.-K. (2003). A 1.4-V 10-bit 25-MS/s pipelined ADC using opamp- 

reset switching technique, IEEE ]. Solid-State Circuits, August, vol. 38, pp. 1401-1404, 

ISSN: 0018-9200. 
Cheung, V. S.-L. et al. (2001). A 1-V CMOS switched-opamp switched-capacitor pseudo-2- 

path filter, IEEE ]. Solid-State Circuits, Jan.2001, vol. 36, pp. 14-22, ISSN: 0018-9200. 
Cheung,V. S. L. et al. (2002). A 1-V 10.7-MHz switched-opamp bandpass SA modulator 

using double-sampling finite-gain-compensation technique, IEEE ]. Solid-State 

Circuits, October, vol. 37, pp. 1215-1225, ISSN: 0018-9200. 



Low-Voltage Fully Differential CMOS Switched-Capacitor Amplifiers 93 

Cheung V. S.-L et al. (2003). A 1-V 3.5-mW CMOS switched-opamp quadrature IF circuitry 

for Bluetooth receivers, IEEE /. Solid-State Circuits, May., vol. 38, pp. 805-816, ISSN: 

0018-9200. 
Crols, J. & Steyaert, M., (1994). Switched-opamp: an approach to realize full CMOS 

switched-capacitor circuits at very low power supply voltage, IEEE /. Solid-State 

Circuits, August, vol. 29, pp. 936-942, ISSN: 0018-9200. 
Dessouky, M. & Kaiser, A. (2001). Very low-voltage digital-audio EA modulator with 88- 

dB dynamic range using local switch bootstrapping, IEEE /. Solid-State Circuits, 

March, vol. 36, pp. 349-355, ISSN: 0018-9200. 
Keskin, M. et al. (2002). A 1-V 10-MHz Clock-Rate 13-Bit CMOS EA modulator using unity- 
gain-reset opamps, IEEE /. Solid-State Circuits, July, vol. 37, pp. 817-824, ISSN: 0018- 

9200. 
Martin, K. et al. (1987). A differential switched-capacitor amplifier, IEEE ]. Solid-State 

Circuits, February, vol. 22, pp. 104-106, ISSN: 0018-9200. 
Matsuya, Y. & Yamada, J. (1994). 1-V power supply, low-power consumption A/D 

conversion technique with swing-suppression noise shaping, IEEE /. Solid-State 

Circuits, December, vol. 29, pp. 1524-1530, ISSN: 0018-9200. 
Nicollini,G. A. et al. (1996). A -80dB THD, 4-Vpp switched capacitor filter for 1.5-V battery- 
operated systems, IEEE /. Solid-State Circuits, August, vol. 31, pp. 1214-1219, ISSN: 

0018-9200. 
Park, J.-B. et al. (2004). A 10-b 150-MSample/s 1.8-V 123-mW CMOS A/D converter with 

400-MHz input bandwidth, IEEE /. Solid-State Circuits, August, vol. 39, pp. 1335- 

1337, ISSN: 0018-9200. 
Peluso, V. et al. (1997), A 1.5-V 100-uW SA modulator with 12-b dynamic range using the 

switched-opamp technique, IEEE /. Solid-State Circuits, July, vol. 32, pp. 943-952, 

ISSN: 0018-9200. 
Peluso,V. et al. (1998). A 900-mV low-power XA A/D converter with 77-dB dynamic 

range," IEEE J. Solid-State Circuits, December, vol. 33, pp. 1887-1897, ISSN: 0018- 

9200. 
Rabii, S. & Wooley, B. A. (1997). A 1.8-V digital-audio sigma-delta modulator in 0.8-um 

CMOS, IEEE J. Solid-State Circuits, June, vol. 32, pp. 783-796, ISSN: 0018-9200. 
Rombouts, P. et al. (2001). A 13.5-b 1.2- V micropower extended counting A/D converter, 

"IEEE /. Solid-State Circuits, February, vol. 36, pp. 176-183, ISSN: 0018-9200. 
Sauerbrey, J. et al. (2002). A 0.7-V MOSFET-only switched-opamp SA modulators in 

standard digital CMOS technology, IEEE /. Solid-State Circuits, December, vol. 37, 

pp. 1662-1669, ISSN: 0018-9200. 
Waltari, M. & Halonen, K. A. I. (2001). 1-V 9-Bit pipelined switched-opamp ADC," IEEE /. 

Solid-State Circuits, January, vol. 36, pp. 129-134, ISSN: 0018-9200. 
Waltari, M. & Halonen, K. (1998). Fully differential switched opamp with enhanced 

common-mode feedback, Electron. Lett. , November, vol. 34, no. 23, pp. 2181-2182, 

ISSN:0013-5194.. 
Wang, L. &. Embabi S. H. K. (2003). Low-voltage high-speed switched-capacitor circuits 

without voltage bootstrapper, IEEE /. Solid-State Circuits, August, vol. 38, pp. 1411- 

1415, ISSN: 0013-5194. 



94 Advances in Solid State Circuits Technologies 

Wu, P. Y. et al. (2007). A 1-V 100-MHS/s 8-bit CMOS Switched-Opamp Pipelined ADC 

Using Loading-Free Architecture, IEEE ]. Solid-State Circuits, April, vol. 42, pp. 

730-738, ISSN:0013-5194. 
Yang, J. W. & Martin, K. W. (1989). High-resolution low-power D/A converter, IEEE }. 

Solid-State Circuits, October, vol. 24, pp. 1458-1461, ISSN: 0013-5194. 
Yoshizawa, H. et al. (1999). MOSFET-only switched-capacitor circuits in digital CMOS 

technology," IEEE J. Solid-State Circuits, June, vol. 34, pp. 734-747, ISSN: 0013-5194. 



Multi-Mode, Multi-Band Active-RC Filter 
and Tuning Circuits for SDR Applications 

Kang-Yoon Lee 

Konkuk University 
Republic of Korea 



1. Introduction 

The prevalence of wireless standards and the introduction of dynamic standards/ 
applications, such as software-defined radio, necessitate filters with wide ranges of 
adjustable bandwidth/ power, and with selectable degrees and shapes. The baseband filters 
of transceivers often utilize a significant portion of the power budget, especially when high 
linearity is required. Likewise, a widely tunable filter designed for its highest achievable 
frequency consumes more power than necessary when adjusted to its lowest frequency. 
Because power consumption is proportional to the dynamic range and frequency of 
operation, power-adjustable filters have recently gained popularity, as they can adapt their 
power consumption dynamically to meet the needs of the system. 

Dynamic variation in filter attributes (e.g. frequency, order, type) coupled with companies' 
desire to reuse IP has popularized highly programmable filters. The lowpass-filter cutoff 
frequencies of several wireless/wireline standards are within the 1-20 MHz frequency 
range. Many of these standards are irregularly spaced in frequency and do not lend 
themselves well to standard binary- weighted resistor arrays. In previous designs frequency 
is solely controlled digitally, and hence, digital circuitry or an ADC is used to tune the 
frequency of the filters. Gm-C filters offer continuous frequency tunability and can operate 
at higher frequencies than their active-RC counterparts. MOSFET-C filters can also provide 
continuous frequency tuning, but both Gm-C and MOSFET-C filters lack good linearity. 
MOSFET-C filters additionally suffer from reduced tuning range at lower supply voltages; 
also, in MOSFET-C most of the voltage drop occurs over nonlinear MOSFET triode resistors, 
which appreciably degrades its linearity. Along these lines, filters with good linearity have 
been developed that tune on the basis of duty-cycle control in switched-R-MOSFET-C filters; 
however, duty-cycle control by nature necessitates a discrete-time filter. Active-RC filters, 
known to have good linearity, have been used for continuous-time programmable filters. 
Some such filters have narrow frequency ranges, with a few others providing wider but 
solely discrete ranges. Purely discrete switched resistor tuning limits filter frequency tuning 
to discrete frequencies determined by the overall frequency range and number of bits used. 
Tuning to different frequency bands precisely would require very fine resistor stripes to 
meet precision requirements for the low -frequency end. 

Newer technologies offer increased integration with smaller feature sizes, allowing the filter 
to be on the same chip with other transceiver blocks. This integration especially promotes 
reconfigurable architectures, as DSPs can be integrated with the transceiver and can control 
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the mode of operation. However, as minimum feature size shrinks, supply voltages also 
reduce, which complicates classical linearization techniques of Gm-C filters. In this respect, 
active-RC filters possess an intrinsic linearity advantage. 

The third generation standards will create a demand for cellular phones capable of 
operating both in the new wideband and in the existing narrower band systems. Second 
generation system has a channel bandwidth in range of tens of kilohertz, whereas channel 
bandwidths of wideband systems are in megahertz range. So the corner frequency of an 
analog channel select filter must be tunable over at least a decade of frequency. This will 
increase the power consumption and the area. The proposed multi-mode baseband filter can 
minimize the area and optimize the power consumption by sharing the capacitors and 
resistors. And new tuning method can reduce the number of switches in programmable 
capacitor arrays which can be dominant noise sources. 

This chapter is organized as follows. In Section 2, the multi-mode, multi-band active-RC 
filter architecture is described. Section 3 describes filter tuning circuits. Section IV shows 
experimental results from a 0.35 urn CMOS implementation and Section V concludes the 
paper. 



2. Multi-mode, multi-band active-RC filter architecture 
2.1 Multi-mode, multi-band active-RC low-pass filter 

Cont(7:0) 

Mode_sel(2:0) 



o m o — i 




Vin 



(b) 
Fig. 1. (a) Resistor matrices and capacitor matrices (b) schematic of the baseband filter. 

Fig. 1 shows the designed active-RC 5 th -order Chebyshev filter. Resistor matrices and 
capacitor matrices are shown in Fig. 1(a). Resistor matrics are composed of resistors and 
switches. Switches are controlled by Mode_sel(3:0), as defined in Table 1. Resistor for 
WCDMA is not connected to any switch. 

The bandwidths of PDC, GSM, IS-95, and WCDMA are 13 kHz, 100 kHz, 630 kHz, and 2.1 
MHz, respectively. Mode_sel(3:0) bits are set through the serial interface and represented in 
thermometer code. The corner frequency was made tunable by using programmable 
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capacitor matrices. Capacitor matrices are composed of capacitors and switches. The control 
bits required for each mode are 2-bits. Thus, total number of 8-bits are required. The tuning 
bits, cont(7:0) are determined from the on-chip tuning block based on the mode. In PDC 
mode, the low corner frequency leads to large passive components occupying a lot of die 
area [2]. Because the capacitor matrices dominate the area, capacitors were shared between 
modes. There are trade-offs between resistor values and capacitor values. When resistor 
values are reduced to make thermal noise small, capacitor values become large. That leads 
to a large area. On the other hand, as capacitor values become smaller to reduce the area, the 
noise level rises. So, capacitor values and resistor values were optimized. 



Mode_sel(3:0) 


Standard 


Bandwidth 


"0001" 


PDC 


13 kHz 


"0010" 


GSM 


100 kHz 


"0100" 


IS-95 


630 kHz 


"1000" 


WCDMA 


2.1 MHz 



Table 1. Mode definition and corresponding bandwidths 

2.2 Tunable active-RC complex band-pass filter 

Fig. 2 shows the Adjacent Channel Interference (ACI) of the PHS system. The nearest 
interferer is located at 600 kHz, and its magnitude is 50 dB larger than the wanted signal. 
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(b) 



Fig. 2. Characteristic of (a) Lowpass Filter in Direct-Conversion Receiver (b) Complex 
Bandpass Filter in Low-IF Receiver 

As shown in Fig. 2(a), if the direct conversion receiver architecture is used, the interferences 
are located at +600 kHz, which can be attenuated by the lowpass filter. However, if the IF 
frequency is 150 kHz, the interferences are shifted to -450 kHz, +750 kHz, respectively as 
shown in Fig. 2(b). If the lowpass filter is used, the attenuation characteristic is tighter 
because the worst case interferer is seemed to be located at 450 kHz. Therefore, the complex 
bandpass filter whose center frequency is located at 150 kHz is designed for the ACS 
performance. 

The transfer function of the complex bandpass filter is found by frequency translating a low- 
pass filter. 



H bpti<») = H ip(i (i >-i' >J c) 



(1) 
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The translation of a single pole is given in Eq. (2) and (3) 



H bp(i<*>)-- 



l + ja>/a? 
1 



1 

'l-2;Q + ;a>/a» 



(2) 



(3) 



A single complex pole cannot be realized with a real filter. Only complex pole pairs can be 
realized. The result of Eq. (2) is a single complex pole. The translated version of a single 
complex pole is also given with Eq. (3). The complex part must just be added to or 
subtracted from the complex term 2 jQ . 
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Fig. 3. Block scheme for the realization of a single complex pole. 

The realization of for a single pole is given in Fig. 3. It is nothing more than the direct 
synthesis of the transfer function. Fig. 3 is the full block schematic with building blocks for 
real signals. 

Fig. 4(a) shows the designed 3rd-order Chebyshev complex bandpass filter. The wanted 
signal is composed of the in-phase signal and quadrature signal, which are separated by the 
90° phase. Complex bandpass filter uses both signals to perform the complex operations. As 
shown in Fig. 4(a), the complex bandpass filter has the in-phase signal path and the 
quadrature-phase signal path. Internal nodes of each paths are inter-connected to other 
paths. Therefore, I/Q mismatches is one of the most critical design issues in the complex 
filter. In this design, because I/Q mismatch compensation scheme is applied, I/Q mismatch 
is drastically reduced. 
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Fig. 4. (a) Schematic (b) resistor arrays (c) capacitor arrays of the complex bandpass filter 

Resistor arrays and capacitor arrays are shown in Fig. 4(a). Resistor arrays are composed of 
resistors and switches. Resistor arrays control the center frequency of the bandpass filter and 
its control signals, rconti(6:0), rcontq(6:0), are set through the serial interface and 
represented in thermometer code. 

The corner frequency was made tunable by using programmable capacitor arrays. Capacitor 
arrays are composed of capacitors and switches. The tuning bits, ccont(l:0), are determined 
from the on-chip tuning block. There are trade-offs between resistor values and capacitor 
values. When resistor values are reduced to make thermal noise small, capacitor values 
become large. That leads to a large area. On the other hand, as capacitor values become 
smaller to reduce the area, the noise level rises. So, capacitor values and resistor values were 
optimized. 
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3. Tuning circuit of active-RC filter 

Resistors and capacitors are usually varied about +15% due to the process variation. In 

continuous time filters, this leads to a large variation of the corner frequency, which in most 

case, must be compensated by adjusting the component values. 

Conventional full analog tuning circuit based on VCO is shown in Fig. 5(a). However, this 

tuning circuit is not suitable to tune an active-RC filter with programmable capacitor 

matrices. The output of the loop filter in the PLL is analog voltage, which cannot be 

interfaced directly with the capacitor matrices. 

On the other hand, too many digital bits are required for fine resolution in the conventional 

full digital circuit shown in Fig. 5(b). Thus, the area and noise level are too high due to many 

number of switches and capacitor matrices. 
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Fig. 5. (a) Full analog tuning method (b) Full digital tuning method. 

The concepts of the full analog, full digital and proposed two-step tuning method are shown 
in Fig. 6(a), (b) and (c), respectively. 

The block diagram of the proposed two-step tuning scheme is shown in Fig. 7. The clock 
generator provides the clocks, clkO, clkl to coarse and fine tuning controllers. Ctu is charged 
during clkO is high, and VCOMP is sampled by clkl. Reference voltages for comparators, 
Vref, Refl, RefH, RefM are generated in the reference voltage generator block. The operation 
is as follows. Before main capacitor tuning steps, the reference tuning loop is enabled to 
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Fig. 6. Concept of (a) full analog tuning method (b) full digital tuning method (c) proposed 
two-step tuning method. 




Fig. 7. Block diagram of the proposed two-step tuning method. 

compensate the resistor variation. Pbias is compared with Vref, and vres(2:0) is controlled 

according to the result. When pbias is larger than Vref, resistor load should be smaller, so 

vres(2:0) is increased. On the other hand, if pbias is smaller than Vref, resistor load should 

be larger, so vres(2:0) is decreased. Reference tuning is completed when pbias crosses the 

Vref. 

After the reference tuning, main capacitor tuning is done in two-steps, that is, the coarse 

tuning and the fine tuning. 

Fig. 8 shows the timing diagram of the proposed two-step tuning method. 
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Step 1 : Coarse Tuning 



Step 2 : Fine Tuning 



CLK1 

Coarse_Lock 
CAPS(1:0) 

Fbias 



_T1 _TL_TL_TL.. _TL_ 



(22) •• <°DK 



00 



Fig. 8. Timing diagram of the proposed tuning method. 
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Fig. 9. (a) Block diagram (b) timing diagram of the replica filter in coarse tuning block. 

Fig. 9 shows the block diagram and the timing diagram of the replica filter in the coarse 
tuning block. When VCOMP voltage is precharged when clkO and clkl are high. When clkO 
goes from high to low, the VCOMP voltage is determined as Eq. 4. 



VCOMP = V mm - 



1 

(\7 _V \t 

~DC \ re f - ci com J 



(4) 



First, CCONT(1:0) are tuned until VCOMP is located between pre-determined ranges. 
VCOMP is compared with refL and refH. When VCOMP is higher than refH, CCONT(1:0) 
are increased. Whereas, if VCOMP is lower than refL, CCONT(1:0) are decreased. When 
VCOMP is located between refL and refH, coarse_Lock signal goes from low to high. 
Usually, many tuning capacitance levels are required for fine resolution. But, only two bits 
are sufficient in this design with the two-step tuning method. After the coarse_lock signal is 
asserted, the corner frequency is tuned by the fine tuning control block. 
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Fbias controls the tail current of the op-amp. Thus, the DC-gain of the op-amp is changed 
according to the Fbias voltage. If the DC-gain of the op-amp is infinite, the cut-off frequency 
does not change. However, because the DC-gain of the op-amp is finite, the cut-off 
frequency of the filter is changed as the DC-gain of the op-amp changes. Fbias is compared 
with refM. When VCOMP is larger than refM, Fbias should be increased. On the other hand, 
Fbias should be decreased when VCOMP is smaller than refM. The range of Fbias is 0.8 V to 
1.2 V. The bandwidth of the op-amp is adjustable according to the mode to save the power. 
And the transistor sizes in the op-amp are designed to be very large to reduce the 1/f noise. 

4. Experimental results 

4.1 Multi-mode, multi-band active-RC low-pass filter 

The multi-mode, multi-band active-RC low-pass filter was fabricated using a 0.35 urn CMOS 
process. The chip area is 3.8 mm 2 and the supply voltage is 3 V. 
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Fig. 10. Amplitude response of the filter in WCDMA mode. 

Fig. 10 shows the amplitude response of the filter in WCDMA mode at every code. The cut- 
off frequency is 2.1 MHz in WCDMA mode. The current consumptions are 5.2 mA, 6.1 mA, 
7.1 mA, and 8.4 mA, respectively. The current in PDC mode is less than that of WCDMA 
mode because the gain and the bandwidth of the op-amp are smaller in PDC mode. The 
frequency tuning range is from 10 kHz to 3MHz. Out-of-band IIP3 was determined by 
performing IM3 test. In PDC mode, when two tones of +15 dBm at 20 kHz and 30 kHz are 
applied, IM3 is -77 dBm. In WCDMA mode, out-of-band IM3 is -68 dBm, when two tones of 
+14 dBm at 1.8 MHz and 3.0 kHz are applied. Input-referred average passband noise 

densities of the filter are 250, 130, 85, and 54 nV / VHz for PDC, GSM, IS-95, and WCDMA, 
respectively. The passband ripple is less than 0.5 dB in all modes. The stopband rejections 
are 79, 79, 75, and 75 dB for PDC, GSM, IS-95, and WCDMA, respectively. Table 2 
summarizes the performance of the filter. 
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Technology 


0.35 |am CMOS 


Chip area 


3.8 mm2 


Supply voltage 


3V 


Tuning range 


10 kHz ~ 3 MHz 




PDC 


GSM 


IS-95 


WCDMA 


Current (mA) 


5.2 


6.1 


7.1 


8.4 


IIP3 (dBm) 


28 


25 


23 


21 


Noise 
(nV/Jtte) 


250 


130 


85 


54 


Passband ripple 


0.5 


0.5 


0.5 


0.5 


Stopband 
rejection 


79 


79 


75 


75 



Table 2. Performance summary 

4.2 Tunable active-RC complex band-pass filter 

The complex bandpass filter was fabricated using a 0.35 urn CMOS process. The chip area is 
3.8 mm 2 . The supply voltage is 3 V. Fig. 11 shows the microphotograph. 




Fig. 11. Chip microphotograph 

Fig. 12 shows measured amplitude response of the complex baseband filter, when the 
temperature is changed from -10°C to 60°C. The cut-off frequency is 150 kHz + 110 kHz. The 
cut-off frequency is almost constant as the temperature is changed due to the proposed filter 
tuning method. 
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F requency, kH z 

Fig. 12. Measured amplitude response of the complex bandpass filter 

The power consumption is 13 mW. The frequency tuning range is 100 kHz, which is the 50% 
of the signal bandwidth. That is, the cut-off frequency can be adjusted by 100 kHz although 
it is shifted due to the temperature, the supply voltage, and the process variations. 
Out-of-band IIP3 was determined by performing IM3 test. When two tones of -44 dBm at 
600 kHz and 1.2 MHz are applied, IIP3 is +25 dBm. 

Input-referred average passband noise density of the filter is 85 nV / V Hz . The passband 
ripple is less than 0.8 dB, and the stopband rejection at -450 kHz is 66 dB. Table 3 
summarizes the performance of the filter. 



Technology 


0.35 urn CMOS 


Chip area 


3.8 mm' 


Supply voltage 


3 V 


Tuning range 


100 kHz 


Power (mW) 


13 


IIP3 (dBm) 


25 


Noise (nV /*jHz) 


85 


Passband ripple 


0.8 



Table 3. Performance summary 



5. Conclusion 

The CMOS multi-mode, multi-band low-pass filter and complex baseband filter are presented. 
Capacitors and resistors were shared to minimize the area. Proposed two-step tuning method 
can reduce the number of switches and thus, can reduce the noise and the area. 
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1. Introduction 

The demand for data communication is growing rapidly due to the increasing popularity of 
the Internet and other factors [1]. The unprecedented success of information technology in 
recent years ushers an explosive growth in demand for the Internet access in the 21st 
century. Ultra-wideband transmission media are needed in order to provide high-speed 
communications for a much larger number of users. A promising solution to the capacity 
crunch can come from wavelength-division-multiplexed optical fiber communication 
systems that are shown to provide enormous capacities on the order of terabit per second 
over long distances. These systems utilize single-mode fibers, in conjunction with erbium- 
doped fiber amplifiers, as the transmission medium [2]. The optical silica fiber might be the 
only proper choice to realize this task. Optical fiber based communication is the excellent 
alternative for these purposes which needs low dispersion as well as dispersion slope and 
large bandwidth supported by optical physical medium[3]. Optical transmission began to be 
used in trunk cables about 1990; the capacity of those systems was several hundred Mbit/s 
per fiber. The capacity jumped to 2.5 (5.0) Gbit/s per fiber with the introduction of optical 
repeaters using erbium-doped fiber amplifiers in 1995. It jumped again in 1998, to 10 (20) 
Gbit/s per fiber, with the introduction of wavelength division multiplexing (WDM) [5]. 
Overall transmission capacity now exceeds 100 Gbit/s per fiber due to improvements in 
WDM techniques [6]. Usually, those techniques are called dense WDM (DWDM). In recent 
years, the increasing demands for transmission capacity have led to intense research 
activities on high capacity DWDM communication system [7] . 

Nowadays, applications such as optical time division multiplexing (OTDM) and dense 
wavelength division multiplexing (DWDM) are usual tasks in industry [1]. Therefore by 
considering these applications, providing a large bandwidth and high-speed 
communication possibility using optical fibers is highly interesting [3] . 

In the following, we review requirements for DWDM, Dispersion properties, optical 
nonlinearity, loss properties, and design of optical fiber for DWDM. 

2. Requirements for DWDM 

The number of wavelengths (channels) in the fibers of DWDM systems is increasing. As 
discussed earlier, a WDM signal typically occupies a bandwidth of 30 nm or more, although 
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it is bunched in spectral packets of bandwidth ~ 0.1 nm (depending on the bit rate of 
individual channels) [1]. To use such new optical transmission systems, the DWDM fiber 
should overcome three transmission related limitations: dispersion, optical fiber 
nonlinearity and loss properties. For 10-Gb/s channels, the third-order dispersion does not 
play an important role as relatively wide (> 10 ps) optical pulses are used for individual 
channels. However, because of the wavelength dependence of PU2, or the dispersion 
parameter D, the accumulated dispersion will be different for each channel [4]. Wavelength 
division multiplexing (WDM) systems have been widely introduced for large capacity 
transmission. In order to further increase the transmission capacity, several techniques have 
been investigated, such as higher bit-rate transmission [8], enhancement of the spectral 
efficiency [9], and use of new transmission bands [10, 11]. In those systems, optical fibers are 
required more strongly to reduce the nonlinearity and the dispersion slope [5]. 

3. Dispersion properties 

Silica fibers suffer from some disadvantages especially dispersion and dispersion slope. 
Meanwhile, these two factors cause severe restrictions for high-speed pulse propagation [1] . 
The dispersion value becomes larger by the wavelength increasing in the conventional 
optical fibers. So owing to the dissimilar broadening for different channels, the multi- 
channel application realization would be hard. A suitable optical fiber should meet the small 
dispersion as well as the small dispersion slope in the predefined wavelength interval. The 
dispersion properties are the dispersion itself and the dispersion slope of the optical fiber. 
The dispersion value cannot be in the zero-value region because FWM causes interaction 
between signals (optical channels) in DWDM systems when there is phase matching 
between the optical channels due to zero dispersion. Therefore, the dispersion value in the 
signal wavelength region must have the proper non-zero value. The sign of the dispersion 
value should be positive for short-distance transmission and negative for ultra-long-distance 
transmission [12] because of modulation instability in the positive dispersion in a long link. 
When the signal wavelength band becomes wider, the difference in the dispersion values at 
the edges of the wavelength band becomes larger. Dispersion compensation thus becomes 
difficult for long distance DWDM transmission. To achieve both long distance and high 
speed transmission with easy dispersion compensation for a wide wavelength band, the 
dispersion slope should be reduced. For single wavelength communication, dispersion 
shifted fiber is enough. But for applications such as DWDM this method cannot provide 
high speed possibility. In these applications, the physical media should provide the flat, 
minimum, and uniform dispersion as well as dispersion slope ideally. An important 
limitation induced by chromatic dispersion and its slope is broadening factor which restricts 
the bit rate parameter. 

To minimize pulse broadening in an optical fiber, the chromatic dispersion should be low 
over the wavelength range used. A fiber in which the chromatic dispersion is low over a 
broad wavelength range is called a dispersion-flattened fiber. 

4. Optical nonlinearity 

The response of any dielectric to light becomes nonlinear for intense electromagnetic fields, 
and optical fibers are no exception. Even though silica is intrinsically not a highly nonlinear 
material, the waveguide geometry that confines light to a small cross section over long fiber 
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lengths makes nonlinear effects quite important in the design of modern light wave systems 
[1]. The much higher power level due to simultaneous transmission of multiplexed channels 
and propagation over much longer distances made possible with the utilization of fiber 
amplifiers, cause the otherwise weak and negligible fiber nonlinearities to affect the signal 
transmission significantly [2]. When the number of signal wavelengths carried in an optical 
fiber increases, the average transmission power density becomes larger than that in 
conventional systems. Consequently, optical-fiber nonlinearities have emerged as a main 
issue. This nonlinearity seriously limits transmission capacity with various nonlinear 
interactions, which are generally categorized as scattering effects and optical signal 
interactive modulation. Because the signal power density is stronger due to the greater 
number of channels in DWDM systems, optical fiber nonlinearity limits the 
number/ spacing of the channels and the length/ speed of the transmission. In general, the 
refractive index of optical fiber has a weak dependence on optical intensity (equal to signal 
power (p) per effective area (A e ff) in the fiber. Optical fiber nonlinearity arises from 
modulation of the refractive index caused by changes in the optical intensity of the signal. 
This cause four wave mixing (FWM), self-phase modulation (SPM) and cross phase 
modulation (XPM) can be observed in the fiber. The XPM and SPM distort the signals. 
Therefore, optical fiber nonlinearity must be reduced. The most practical way to do this is to 
enlarge A e ff [13]. The relationship between A e ff and mode field diameter (MFD) is direct and 
proportional. As a result, enlarging MFD is a practical solution for low nonlinearity. The 
choice of dispersion shifted fibers (DSFs) along with erbium-doped fiber amplifiers 
(EDFAs), for operation at 1550nm window, would be an ideal one to achieve greater 
transmission distance and utilize full capacity of transmission system [5,7,15]. However, 
when the system is operated at the zero dispersion wavelengths, the nonlinear interaction 
between the channels and noise components is increased. The system working slightly away 
from the zero dispersion wavelengths can reduce these unwanted interactions. The WDM 
system reduces the nonlinear effects and enables multi-wavelength transmission through 
non-zero dispersion shifted fibers having very small dispersion in duration 1530-1610 nm. 
In order to increase the information carrying capacity, latest high speed communication 
system is based on the dense wavelength division multiplexing/ demultiplexing (DWDM) 
[16, 17]. In such systems, nonlinear effects like four wave mixing (FWM), which arise due to 
simultaneous transmission at many closely spaced wavelengths and high optical gain from 
EDFA, imposes serious limitations on the use of a DSF with zero dispersion wavelength at 
1550 nm [18,19]. To overcome this difficulty, the nonzero dispersion shifted fibers having 
small dispersion in the range ~ 2-4 ps/km/nm over the entire gain window of EDFA have 
been proposed [20, 21]. In such fibers, the phase matching condition is not satisfied and 
hence the effect of FWM becomes negligible due to small dispersion [15]. 

5. Loss properties 

Progress in optical fiber fabrication technologies has resulted in a routine production of low 
loss single mode fibers. This enables us to apply the single mode fibers promisingly in high 
bit rate and long haul optical transmission systems. Structural optimization must be 
established so as to provide desirable transmission characteristics for given operating 
conditions. A basic design consideration has been made by taking into account transmission 
characteristics such as fiber intrinsic loss, bending loss, splice loss, and launching efficiency 
[22, 23] . Use of commercially available erbium doped fiber amplifiers (EDFA), which forces 
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optical communication systems to be operated in the 1550 ran window, has significantly 
reduced the link length limitation imposed by attenuation in the optical fiber [15]. The fiber 
loss is one of the significant restrictions in the optical fiber communication links. It is one of 
some reasons limit the maximum distance that information can be sent without presence of 
the repeaters. Meanwhile, due to the loss, the pulse amplitude reduces so that the initial 
information cannot be restored in the noisy conditions. Seeing that, in the fiber design one 
likes to shift the zero dispersion wavelength to the region that the fiber has the lowest level 
attenuation. The combination of natural attenuations has a global minimum around 1.55 um 
and that is why most optical communication systems are operated at this wavelength [4, 18]. 
A kind of loss which must be taken into account in fiber design is the bending loss. Every 
time an optical fiber is bent, radiation occurs. When a bent occurs, a portion of the power 
propagating in the cladding is lost through radiation. 

6. Design of DWDM fiber 

There are three methods to increase the capacity of a DWDM transmission system, using a 
broad wavelength range, narrowing channel spacing and increasing a bit rate per channel. 
However, one of disadvantages for the last two methods is the degradation of the 
transmission performance due to optical nonlinear effects. In this area, there are three 
categories which cover all designs. There are based on using zero dispersion shifted fibers 
(ZDSFs), non-zero dispersion shifted fiber (NZDSFs) and dispersion flattened fibers (DFFs). 

7. Zero Dispersion Shifted Fibers (ZDSFs) 

Use of commercially available erbium doped fiber amplifiers (EDFA), which forces optical 
communication systems to be operated in the 1550 nm window, has significantly reduced 
the link length limitation imposed by attenuation in the optical fiber. However, high bit rate 
(~10 Gb/s) data transmission can be limited by the large inherent dispersion of the fiber. 
Dispersion shifted fibers (DSF), which has zero dispersion around 1550 nm, have been 
proposed and developed to overcome this problem. Dispersion shifted fiber for single 
wavelength optical communication is a proper choice. The much higher power level due to 
simultaneous transmission of multiplexed channels and propagation over much longer 
distances made possible with the utilization of fiber amplifiers, cause the otherwise weak 
and negligible fiber nonlinearities to affect the signal transmission significantly. The effects 
of fiber nonlinearities on pulse propagation and on the capacity of fiber optic 
communication systems have been studied extensively by many researchers. To mitigate the 
nonlinear effects in long fiber optic communication systems by zero dispersion shifted fiber, 
a new generation of optical fibers, referred to as large effective area fibers, has been 
introduced. As said earlier, in order to reduce nonlinear effects, it is preferred to increase 
effective area. Gathering zero dispersion and large effective area together will be an 
appropriate solution in this task. The large effective area fibers allow a much smaller light 
intensity inside the guiding region, thus resulting in less refractive index nonlinearity than 
the conventional single mode fibers. In addition to reduced nonlinearities, large effective 
area fibers must also provide low attenuation, low bending and micro-bending losses, low 
chromatic dispersion, and low polarization mode dispersion. In recent years, a variety of 
large effective area fiber designs have been reported in the literature. These designs may be 
broadly classified into two groups based on their refractive index profiles; R-type and M- 
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type. Each of two types is divided to two other categories too named type I and II. A small 
pulse broadening factor (small dispersion and dispersion slope), as well as small 
nonlinearity (large effective area) and low bending loss (small mode field diameter) are 
required as the design parameters in Zero dispersion shifted fibers [24]. The performance of 
a design may be assessed in terms of the quality factor. This dimensionless factor 
determines the trade-off between mode field diameter, which is an indicator of bending loss 
and effective area, which provides a measure of signal distortion owing to nonlinearity [25]. 
It is also difficult to realize a dispersion shifted fiber while achieving small dispersion slope. 
Here, we attempted to present an optimized Mil triple-clad optical fiber to obtain exciting 
performance in terms of dispersion and its slope [24]. The index refraction profile of the Mil 
fiber structure is shown in Fig. 1. According to the LP approximation [26] to calculate the 
electrical field distribution, there are two regions of operation and the guided modes and 
propagating wave vectors can be obtained by using two determinants which are constructed 
by boundary conditions [27]. 
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Fig. 1. Refractive index Profile for Mil Structure. 

For calculation of dispersion and dispersion slope the following parameters are used. 
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where P and Q are geometrical parameters. Also, the optical parameters for the structure are 
defined as follows. 
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For evaluating of the index of refraction difference between core and cladding the following 
definition is done. 
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Here, we propose a novel methodology to make design procedure systematic. It is done by 
the aim of optimization technique and based on the Genetic Algorithm. A GA belongs to a 
class of evolutionary computation techniques [28] based on models of biological evolution. 
This method has been proved useful in the domains that are not understood well; search 
spaces that are too large to be searched efficiently through standard methods. Here, we 
concentrate on dispersion and dispersion slope simultaneously to achieve to the small 
dispersion and its slope in the predefined wavelength duration. Our goal is to propose a 
special fitness function that optimizes the pulse broadening factor. To achieve this, we have 
defined a weighted fitness function. In fact, the weighting function is necessary to describe 
the relative importance of each subset in the fitness function [24]; in other words, we let the 
pulse broadening factor have different coefficient in each wavelength. To weight the 
mentioned factor in the predefined wavelength interval, we have used the Gaussian 
weighting function. The central wavelength (Xo) and the Gaussian parameter (o) are used for 
the manipulation of the proposed fitness function and their effects on system dispersion and 
dispersion slope. To express the fiber optic structure, we considered three optical and 
geometrical parameters. According to the GA technique, the problem will have six genes, 
which explain those parameters. It should be mentioned that the initial range of parameters 
are chosen after some conceptual examinations. The initial population has 50 chromosomes, 
which cover the search space approximately. By using the initial population, the dispersion 
(P2) and dispersion slope (($3), which are the important parameters in the proposed fitness 
function, can be calculated. Consequently elites are selected to survive in the next 
generation. Gradually the fitness function leads to the minimum point of the search zone 
with an appropriate dispersion and slope. Equation (6) shows our proposal for the weighted 
fitness function of the pulse broadening factor. 

F = I^I[l + (^) 2 + (^) 2 ]i (6) 

where A ,a,t j/ Z,/3 2 and f) 3 are central wavelength, Gaussian parameter, full width at half 

maximum, distance, second and third order derivatives of the wave vector respectively. In 
the defined fitness function in Eq. (6), internal summation is proposed to include optimum 
broadening factor for each length up to 200 km. By applying the fitness function and 
running the GA, the fitness function is minimized. So, the small dispersion and its slope are 
achieved. This condition corresponds to the maximum value for the dispersion length and 
higher-order dispersion length as well. By using this proposal, the zero dispersion 
wavelengths can be shifted to the central wavelength (Xo). Since, the weight of the pulse 
broadening factor at Xo is greater than others in the weighted fitness function; it is more 
likely to find the zero dispersion wavelength at Xo compared to the other wavelengths. In the 
meantime, the flattening of the dispersion curve is controlled by Gaussian parameter (o). To 
put it other ways, the weighting Gaussian function becomes broader in the predefined 
wavelength interval by increasing the Gaussian parameter (o). As a result, the effect of the 
pulse broadening factor with greater value is regarded in different wavelengths, which 
causes a considerable decrease in the dispersion slope in the interval. Consequently, the zero 
dispersion wavelength and dispersion slope can be tuned by Xo and o respectively. The 
advantage of this method is introducing two parameters (Xo and o) instead of multi- 
designing parameters (optical and geometrical), which makes system design easy. 
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The flowchart given in Fig. 2 explains the foregoing design strategy clearly. 
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Fig. 2. The scheme of the design procedure 

To illustrate capability of the suggested technique and weighted fitness function, the Mil 
triple-clad optical fiber is studied, and the simulated results are demonstrated below. In the 
presented figures, we consider four simulation categories including dispersion related 
quantities, nonlinear behavior of the proposed fibers, electrical field distribution in the 
structures, and fiber losses. 

For all the simulations, we consider A0=1500, 1550 nm and o = 0, 0.027869 and 0.036935 p,m 
as design constants. To apply the GA for optimization, we consider the search space 
illustrated in Table 1 for each parameter as a gene. The choice of these intervals is done 
according to two items. The designed structure must be practical in terms of manufacturing 
and have high probability of supporting only one propagating mode [24]. 



Parameter 


a (ftm) 


V 


Q 


Ri 


#2 


A 


duration 


[2-2.6] 


[0.4-0.9] 


[0.1-0.7] 


[0.05-0.99] 


[(-0.99)- (-0.05)] 


[2x10-3-1x10-2] 



Table 1. Optimization Search Space of Optical and Geometrical Parameters 

The wavelength and distance durations for optimization are selected as follows. For 
Ao=1550nm: 1500 nm<\< 1600 nm, for A =1500 nm: 1450 nm <X< 1550 nm, and < Z < 200 
km. In this design method Z is variable. In the simulations an un-chirped initial pulse with 5 
ps as full width at half maximum is used. Considering the information in Table 1 and GA 
method, optimal parameters are extracted and demonstrated in Table 2. 
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X Qim) 


a (fim) 


A 


Ri 


Rz 


V 


Q 


o=0 


1.55 


2.0883 


8.042e-3 


0.5761 


-0.4212 


0.7116 


0.3070 


1.5 


2.1109 


7.036e-3 


0.6758 


-0.2785 


0.8356 


0.2389 


a = 2.7869 xlCT 8 


1.55 


2.0592 


9.899e-3 


0.7320 


-0.2670 


0.7552 


0.2599 


1.5 


2.5822 


9.111e-3 


0.5457 


-0.4237 


0.7425 


0.2880 


a = 3.6935 xlCT s 


1.55 


2.2753 


9.933e-3 


0.5779 


-0.4218 


0.6666 


0.3428 


1.5 


2.5203 


9.965e-3 


0.4867 


-0.3841 


0.6819 


0.3324 



Table 2. Optimized Optical and Geometrical Parameters at Ao=1500, 1550 nm and three given 
Gaussian parameters 

It is found that optimization method for precise tuning of the zero dispersion wavelengths 
as well as the small dispersion slope requires large value for the index of refraction 
difference (A). That is to say that providing large index of refraction is excellent for the 
simultaneous optimization of zero dispersion wavelength and dispersion slope. First, we 
consider the dispersion behavior of the structures. To demonstrate the capability of the 
proposed algorithm for the assumed data, the obtained dispersion characteristics of the 
structures are illustrated in Fig. 3. It shows that the zero dispersion wavelengths can be 
controlled precisely by controlling the central wavelength. Meanwhile, the Gaussian 
parameters are used to manipulate the dispersion slope of the profile. Considering Fig. 3 
and Table 3, it is found that the zero value for the Gaussian parameter can tune the zero 
dispersion wavelengths accurately (-100 times better than other cases). 
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Fig. 3. Dispersion vs. Wavelength at A0=1500nm, 1550nm with o as parameter. 

Second, the dispersion slope is examined. The presented curves say that by increasing the 
Gaussian parameter the dispersion slope becomes smaller, and it is going to be smooth in 
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large wavelengths. Furthermore it is clear that there is a trade-off between tuning the zero 
dispersion wavelengths and decreasing the dispersion slope as shown in Figs. 3, 4, and 
Table 3. 



type 


Wm) 


Dispersion 

(ps /km 1 nm) 


Dispersion 
Slope 

(ps/km/nm 2 ) 


Effective 
Area 

[urn 1 ) 


Mode Field 

Diameter 

(jum) 


Quality 
Factor 


CT-0 


1.55 


-2.57e-4 


0.0695 


191.92 


7.95 


3.04 


1.5 


2.55e-5 


0.0828 


344.15 


9.76 


3.61 


<r = 2.7869 xlCT 8 


1.55 


-0.013 


0.0647 


194.79 


7.12 


3.85 


1.5 


0.008 


0.0597 


209.95 


6.70 


4.68 


ct = 3.6935 xlCT 8 


1.55 


-0.085 


0.0592 


150.05 


6.82 


3.22 


1.5 


-0.089 


0.0564 


164.21 


6.55 


3.82 



Table 3. Dispersion, Dispersion Slope, Effective Area, Mode Field Diameter and Quality 
Factor at A0=1500nm, 1550nm and three given Gaussian parameters 
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Fig. 4. Dispersion slope Vs. Wavelength at A0=1500nm, 1550nm with o as parameter. 

The normalized field distribution of the Mil based designed structures is illustrated in Figs. 
5 and 6. Because of the special structure, the field distribution peak has fallen in region III. 
As such most of the field distribution displaces to the cladding region. In addition it is 
observed that the field distribution peak is shifted toward the core, and its tail is depressed 
in the cladding region by increasing the Gaussian parameter (except o=0). On the other hand 
the field distribution slope increases inside the cladding region by increasing of the 
Gaussian parameter. 
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Fig. 5. Normalized Field distribution versus the radius of the fiber at Ao=1500nm with o as 
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The effective area or nonlinear behavior of the suggested structures is illustrated in Fig. 7. It 
is observed that the effective area becomes smaller by increasing the Gaussian parameter. 
Figs. 5-7, and Table 3 indicate a trade-off between the large effective area and the small 
dispersion slope. The results illustrated in Fig. 4 show that the dispersion slope reduces by 
increasing the Gaussian parameter. However the field distribution shifts toward the core, 
which concludes the small effective area in this case. Foregoing points show that there is an 
inherent trade-off between these two important quantities. 
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Fig. 7. Effective area versus wavelength at Ao=1550nm, 1500nm with o as the parameter. 

The mode field diameter that corresponds to the bend loss is illustrated in Figs. 8 and 9 for 
both central wavelengths. It is clearly observed that the mode field diameter decreases by 
increasing the Gaussian parameter. In other words, the Gaussian parameter is suitable for 
the bend loss manipulation in these structures. Furthermore, Table 3 shows that the mode 
field diameter is ~7(im in the designed structure. 

As another concept to consider, Table 3 says that the mode field diameter is not affected 
noticeably by increasing the effective area. This is the origin of raising the quality factor in 
these structures. This is a key point why the average amount of the quality factor in the 
proposed structures is increased in Fig. 9. The quality factor of the designed fibers is 
illustrated in Fig. 10. The calculations show that the quality factor is generally larger than 3. 
It is mentionable that the quality factor is smaller than unity in the inner depressed clad 
fibers (W structures) and around unity in the depressed core fibers (jR structures). This 
feature shows the high quality of the putting forward methodology. It is observed that the 
quality factor decreases by increasing the Gaussian parameter. It is strongly related to the 
effective area reduction. 

As another result the dispersion length is illustrated in Fig. 11 for the given Gaussian 
parameter and two central wavelengths. The narrow peaks at A=1500nm and 1550nm imply 
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the precise tuning of the zero dispersion wavelengths. The higher-order dispersion length of 
the designed fibers is demonstrated in Fig. 12. It is clear that the higher-order dispersion 
length increases by raising the Gaussian parameter. 
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Fig. 11. Dispersion Length vs. Wavelength at X =1.5, 1.55 /mi. 

In the following, the nonlinear effect length for 1 mW input power is illustrated in Fig. 13. 
First, it can be extracted that the suggested structures have the high nonlinear effect length. 
For the general distances, these simulations show that the fiber input power can become 
some hundred times greater to have the nonlinear effect length comparable with the fiber 
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dispersion length. Second, the nonlinear effect length decreases and increases, respectively, 
by raising the Gaussian parameter and wavelength. 
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The amount of the fiber bending loss strongly depends on the bend radius and the mode 
field diameter. Figures 14 and 15, respectively, illustrate the bending loss (dB/m) versus the 
bending radius (mm) at Ao= 1550 nm and 1500nm with variance of the weighting function 
(o) as a parameter. According to Figs. 8, 9, 14, and 15, it is clear that smaller mode field 
diameter yields to the greater tolerance to the bending loss. 
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All of the presented outcomes show that the suggested idea has capability to introduce a 
fiber including higher performance. We have presented a novel method that includes the 
small dispersion, its slope, high effective area, and small mode field diameter 
simultaneously [24]. So all options required for the zero dispersion shifted communication 
system are achieved successfully. This advantage is obtained owing to the selection of the 
basic fiber structure as well as the method of optimization. Our selected fiber structure is the 
Mil, and we use the weighted fitness function applied in the GA for optimization. By 
combining the suitable structure and the novel optimization method, all of the stated 
advantages can be gathered simultaneously. The features of the proposed method are 
capable of being extended to all of fiber structures, introduce two parameters instead of 
multi-designing parameters, and tune the zero dispersion wavelengths precisely. 
The ring index profiles fibers have been closely paid attentions because it has the larger 
effective-areas that can minimize the harmful effects of fiber nonlinearity [29]. For the 
proposed Mil fiber structures, the small dispersion and its slope have been obtained thanks 
to a design method based on genetic algorithm. But there is not any concentration on the 
bending loss characteristic at the design process. Here we want to enter bending loss effect 
on the fitness function directly and attempt to present an optimized RII triple-clad optical 
fiber to obtain the wondering performance from dispersion, its slope, and bending loss 
points of view. The index refraction profile of the RII fiber structure is shown in Fig. 16. 
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Fig. 16. Index of Refraction Profile for RII Structure 

To calculate the dispersion, its slope and bending loss characteristics of the structure, the 
geometrical and optical parameters are defined as follows. 



P = -, Q = - 

c c 



(7) 



R, 



R, 



'-, A = 



2n 



(8) 



The design method is based on the combination of the Genetic Algorithm (GA) and 
Coordinate Descent (CD) approaches. It is well known that the GA is the scatter-shot and 
the CD is the single-shot searching technique. The single-shot search is very quick compared 
to the scatter-shot type, but depends critically on the guessed initial parameter values. This 
description indicates that for the CD search, there is a considerable emphasis on the initial 
search position. In this method, it is possible to define a fitness function and evaluate every 
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individuals of the population with it. So we have combined the CD and GA methods to 
improve the initial point selection with the help of generation elite and inherit the quick 
convergence of coordinate descent [30]. In other words, we cover and evaluate the answer 
zone by initial population and deriving few generations and use the elite of the latest 
generation as an initial search position in the CD (Fig. 17). 
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Fig. 17. The Block Diagram of The Proposed Method 

To derive the suggested design methodology, the following weighted cost function is 
introduced. We have normalized the pulse broadening factor in the manner to be 
comparable with bending loss. This normalization is essential to optimize the pulse 
broadening factor and bending loss simultaneously. If not, the bending loss impact will be 
imperceptible and be lost in the broadening factor term. 
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■BL{X), 



(9) 



The bending radius is set on 1 cm and kept still. The fitness function includes dispersion (fii), 
dispersion slope ($3), and bending loss (BL) impacts. In the defined weighted fitness 
function, internal summation is proposed to include optimum broadening factor for each 
length up to 200 km. as said at the beginning of this section, one can adjust the zero 
dispersion wavelength at Xo and dominate the dispersion slope by Gaussian parameter (it). 
The obtained dispersion behaviors of the structures are illustrated in Fig. 18 which 
obviously demonstrates the Xo and a parameters influences. It is clear that the zero- 
dispersion wavelength is successfully set on Xo and the dispersion curve is become flatter in 
the higher a cases. 

To show the capability of the proposed algorithm, Table 4 is presented to clarify the 
different characteristics of these three structures. By considering on Fig. 18 and Table 4, it is 
clear that there is a trade-off between the zero dispersion wavelength tuning and the 
dispersion slope decreasing. In other words, it is found out that the zero value for the a 
parameter can tune the zero dispersion wavelength accurately ( -100 times better than other 
cases). 

The effective area or nonlinear behavior of the suggested structures is listed in Table 4. 
These values are high enough for the optical transmission applications. Owing to the special 
structure of the RII type fiber, the field distribution peak has fallen in the first cladding 
layer. As such most of the field distribution displaces to the cladding region. This is the 
origin of large effective area in the designed structures. The normalized field distribution of 
the RII based designed structures is illustrated in Fig. 19. 
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Fig. 18. Dispersion vs. Wavelength at Ao=l-55 p,m. 



type 


D(X=1.55 jim) 
(ps/km/nm) 


S(X=1.55 jim) 
(ps/km/nm 2 ) 


BL(X=1.55 jim) 
(dB/m) 


A e jj(X=1.55 jim) 
(pm 2 ) 


£7=0.0 


1.38e-4 


0.048 


1.90e-2 


86.84 


a =1.12e-8 


-6.15e-4 


0.041 


1.67e-l 


82.53 


a =3.69e-8 


4.50e-2 


0.035 


4.66e-2 


86.01 



Table 4. Dispersion, Dispersion Slope, Bending Loss, and Affective Area at Ao=l-55 pm and 
Three Given Gaussian Parameters 

Due to the refractive index thermo-optic coefficient and the thermal expansion coefficient, 
the optical and geometrical parameters are altered. Consequently, the optical transmission 
characteristics of the optical fiber such as dispersion, its slope and bending loss are 
confronted to change. In order to evaluate the thermal stability of the designed structures, 
the following results are extracted and presented in Table 5. The dD/dT, dS/dT, dXi/dT, and 
dBL/dT expressions are respectively the chromatic dispersion, its slope, zero dispersion 
wavelength, and bending loss thermal coefficients at 1.55pm. It is found out that this 
environmental factor must be considered in the desired optical fiber design. For example, in 
the worst case, the zero dispersion wavelengths can be shifted more than 3 nm with 100°C. 
In the least design we have focused on RII depressed core triple clad single mode optical 
fiber and presented a combined optimization approach to obtain desirable design goals. 
Furthermore, we have used the special fitness function including dispersion, its slope and 
bending loss impacts simultaneously. With application of this fitness function in the case of 
higher a, we could obtain the dispersion and dispersion slope in [ 1.5 - 1.6 ] pm interval to be 
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Fig. 19. Normalized field distribution versus the radius of the fiber at A=1.55 um with o as 
parameter (dashed, solid line, dotted, and dashed-dotted curve represent the core and three 
cladding layers, respectively). 



Type 


dD/dT 
(ps/km/nm/°C) 


dS/dT 
(ps/km/nmV°C) 


dXo/dT 
(nm/°C) 


dBL/dT 
(dBL/m/°C) 


CT= 0.0 


-1.22x10-3 


+2.83x10-6 


+2.5x10-2 


+3.97x10-6 


ct =1.12e-8 


-1.21x10-3 


+2.93x10-6 


+3.33x10-2 


+2.70x10-5 


a =3.69e-8 


-1.21x10-3 


+2.93x10-6 


+2.5x10-2 


+8.79x10-6 



Table 5. Dispersion, Dispersion Slope, and Bending Loss Thermal Coefficients at Xo = l-55 um 
and Three Given Gaussian Parameters 

[ ( -1.77 ) - ( +1.77 ) ] ps/km/nm and [ ( 0.037 ) - ( 0.033 ) ] ps/km/nm^. Also the amount of 
bending loss at 1.55 um with 1cm radius of curvature and effective area are 4.66e-2 dB/m 
and 86.01 urn 2 respectively. In the meantime, the thermal stabilities of the designed 
structures are evaluated. It is possible to design zero dispersion shifted by using graded 
index structure. The main options are dispersion value and the effective area at 1550nm to 
minimize pulse broadening and nonlinearity effects. Excess investigation of large mode area 
fibers show that there is not serious focusing on design of zero dispersion shifted fiber based 
on the graded index refractive structures. The index refraction profile of the triangular-core 
graded index optical fiber structure, which is suggested by us for the first time, is shown in 
Fig. 20. It is clear that the proposed graded index fiber has a linear variation in core region. 
According to the TMM approach, it is assumed that the refractive index of the fiber with an 
arbitrary but axially symmetric profile is approximately expressed by a staircase function. 
So the field distribution and guided modes are calculated [26] . 
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Fig. 20. Refractive Index Profile for Triangular Core Graded Index Fiber Structure 

Also for easy handling of the problem and calculating the dispersion and its slope of the 
proposed fiber, following optical and geometrical parameters are defined. 



2n 



-, w 3 =n i +rj{n 2 -n i 'j,n 2 =n 3 + /u(n t -n 3 ) . 



(9) 



In order to explain layers' refractive index, rj and ]i coefficients are introduced and set 
between and 1. It must be declared that P parameter relates the cladding layer thickness 
with the core radius. The design method is based on the limited coordinate descent (CD) 
approach [30]. Based on the extensive investigation, it is found out that the smaller core 
radius and larger refractive index difference lead the zero dispersion wavelength to around 
1.55um. Therefore in order to design the dispersion shifted optical fiber, A and core radius 
are set to 8xl0" 3 and 1.8|im respectively. Also it is assumed that the core and first cladding 
layer have same thickness. Then direct search is done for r] and /i parameters in the [0,1] 
interval. To derive the suggested design methodology, the following fitness function is 
introduced which includes the pulse broadening factor. 



. y rn + A(Ai) Z \2 + / A(ti Z ji + / AW 2 ^ I 



2t1 



(10) 



In the defined fitness function, the summation is proposed to include optimum broadening 
factor for each length up to 200 km. The short glance on eq.(10) shows that the above fitness 
function is a limited version of the past one [24], which the wavelength duration optimizing 
is abbreviated. Also, it is not weighted because the fitness function is evaluated at single 
wavelength (Ao). One can adjust the zero dispersion wavelength at Ao. It should be kept in 
mind that in the fiber design, one likes to shift the zero dispersion wavelength to the region 
that the fiber has the lowest level attenuation. The optical attenuation has a global minimum 
around 1.55(im wavelength and that is why the most optical communication systems are 
operated at this wavelength. Seeing that, the Ao parameter of the applied method is set to 
1.55p,m to achieve the desired zero dispersion shifted structure. Considering the parameter 
presented above and CD method, the design procedure is driven and optimal parameters 
are extracted and demonstrated in Table 6. 
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Core radius 


A 


P 


" 


F 


*1.8um 


8xl0- 3 


0.5 


0.25 


0.44 



*b=2a 

Table 6. The Optical and Geometrical Parameters for the Designed Structure. 

The value of the dispersion is -0.04549ps/km/nm at 1.55pm, which is properly small. The 
nonlinear effects in a single mode fiber are the ultimate restricting factors for bit rate and 
distance in long haul optical fiber communication system. Therefore, the large effective area 
single mode fibers have been the subject of considerable studies recently. The effective area 
of the suggested structure is 65.3 |im 2 at 1.55|im, which is acceptable for this application. The 
associated mode field diameter at aforesaid wavelength is 9.03um. 



8. Non-Zero Dispersion Shifted Fibers (NZDSFs) 

Use of commercially available erbium doped fiber amplifiers (EDFA), which forces optical 
communication systems to be operated in the 1550 nm window, has significantly reduced 
the link length limitation imposed by attenuation in the optical fiber. However, high bit rate 
(~ 10 Gb/s) data transmission can be limited by the large inherent dispersion of the fiber. 
Dispersion shifted fibers (DSF), which has zero dispersion around 1550 nm, have been 
proposed and developed to overcome this problem. However in order to increase the 
information carrying capacity, latest high speed communication system is based on the 
dense wavelength division multiplexing/ demultiplexing (DWDM). In such systems, 
nonlinear effects like four wave mixing (FWM), which arise due to simultaneous 
transmission at many closely spaced wavelengths and high optical gain from EDFA, 
imposes serious limitations on the use of a DSF with zero dispersion wavelength at 1550 nm. 
To overcome this difficulty, the nonzero dispersion shifted fibers having small dispersion in 
the range ~ 2-4 ps/km/nm over the entire gain window of EDFA have been proposed. In 
such fibers, the phase matching condition is not satisfied and hence the effect of FWM 
becomes negligible due to small dispersion. Nonlinear effects like cross phase modulation 
(XPM), which limits the numbers of different wavelength signals, can be reduced by 
increasing the mode field diameter (MFD) and hence effective area of the fiber. Therefore 
large effective area nonzero dispersion shifted fibers have been developed [31]. To achieve 
large effective area and low bending and splice loss with conventional fiber, a refractive 
index profile, as shown in Fig. 21, is designed, which is mathematically described by Eq. 
(11). 

«i Ir| < R, 
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where Ri, R2, R3, Ri are radius parameters, Rn=(R^,+Ri)/l is the center of the side core, m, ni, 
113 are the highest refractive index of central core, refractive index of cladding, and highest 
refractive index of side core, respectively, Ai=(ni-n2)/n2 and A2=(n3-n2)/n2 are the relative 
profile heights of central core and side core, respectively, and a is the curve parameter. This 
fiber consists of three parts, including a central core, side core and a cladding layer. The side 
core inside the core region (i.e., the region r<R4) is designed to allow more signal energy to 
flow into the cladding region so that a large effective area can be obtained [31]. 




Rl R Z R, R, 



2 Kg K 4 r ( ra( Jj us ) 



Fig. 21. Refractive index profile of newly designed large effective area, low bending and 
splice loss NZ-DSF. 

From Eq. (11) it is found that the exact refractive index profile are controlled by seven 
parameters (i.e., Ri, R2, R3, R4 , Ai , A2 , and a). To ensure the single mode operation, R4 must 
be less than certain number. Thus, there are six parameters need to be optimized for 
achieving the required dispersion slope, large effective area, low splice loss, low bending 
loss, and low Rayleigh scattering loss simultaneously. By using the random searching 
method, it is found that the fiber has the optimum performance under the following 
conditions: 

fa = 3.2/im {radius of fiber core) 

R ± = 0, 218309a = 0.6986^™ 

R 2 = 0,568242a = 1,8184/™. 

R 3 = 0.7325779c = 2.3442/™ 

R± = l.Qo = 3.2/™ 

A t = 0.6425% 

A. = 0.2378% 

k=1.6 

The Gaussian approximation method is used for calculating electrical field distribution for 
the designed refractive index profile. The designed fiber has a dispersion about 4 
ps/km/nm and dispersion slope about 0.06 ps/km/nm 2 at 1.55 (im operating wavelength, 
which can be used to avoid four- wave mixing (FWM). Also the zero dispersion wavelength 
is adjusted near 1480nm. In addition, calculation also shown that the designed fiber not only 
has a large effective area over 100 |im 2 but also has low bending loss (<1.3xl0~ 3 dB with 30 
mm bending radius and 100 turns) and low splice loss (<6.38xl0- 3 dB) with conventional 
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fiber. In order to broaden the wavelength range - not only the conventional C-band (1530- 
1565 nm) - but also the L-band (1565-1620 nm) and S-band (1460-1530 ran), the dispersion 
slope should be as low as possible, which can decrease the cost of dispersion compensation 
and suppress the self -phase modulation, especially for the 40 Gb/s communication system. 
Now we present a new depressed core index dual ring profile, which can provide a large 
A e ff and a low dispersion slope simultaneously, and the zero dispersion wavelength is less 
than 1430 nm [32]. The fibers with this refractive index profile have the low bending loss 
and low intrinsic loss. The splice loss also can reach the accepted value as it is spliced with 
conventional single mode fiber. The principal design requirements for the fibers are large 
Aeff, small dispersion slope, low bending and intrinsic loss, low polarization-mode 
dispersion, and zero dispersion wavelength that should be lower than 1430 nm. 
Relative refractive index Ani is defined by the equation: Ani=(ni-n c )/n c , where n c is the outer 
cladding layer's refractive index. The dopant in the glass can decrease the glass viscosity, i.e, 
more dopant concentration means less glass viscosity. High difference value between Ani 
and An2 may cause high difference value in viscosity property of the depressed core layer 
and the first raised ring. Therefore, more mechanical stress will be built in the optical fibers 
during the drawing process [33]. On the other hand, high difference value between layers' 
refractive index can increase the compositional variation in the optical fiber. Therefore, more 
thermal stress can be also caused by the radial variation of thermal expansion coefficient 
due to the big compositional variation in optical fiber [34]. The residual stress in optical 
fibers can not only weaken the strength of optical fibers, but also increase the fiber's 
attenuation and polarization mode dispersion values. According to the above description, 
the refractive index profile shown in figure 22 is proposed to overcome problems. Every 
parameters of the fiber profile are set out in Table 6, where Ani are the relative refractive 
index of different layer from the depressed core layer to the cladding, respectively. 
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Fig. 22. Improved refractive index profile with dual ring and depressed outer ring based on 
the depressed core-index. 



Am 
(%) 


An2 
(%) 


An 3 
(%) 


An 4 
(%) 


An 5 

(%) 


ri 
(nm) 


T2 

(nm) 


T3 

(nm) 


T4 

(nm) 


T5 

(nm) 


0.14 


0.57 


-0.27 


0.30 


-0.18 


2.50 


4.10 


6.88 


9.98 


12.41 



Table 6. Parameters of refractive index profile shown in Fig. 22 

Table 7 shows the optical characteristic of fabricated fiber designed according to the refractive 
index profile parameters as Table 6, where MFD, RDS are the mode field diameter and relative 
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dispersion slope, respectively. It is noted that the fiber has a large A e ff of 105 p.m 2 and a small 
dispersion slope of about 0.065 ps/ km /ran 2 simultaneously. Macro bending loss at 1550 ran 
is less than 0.05 dB/km (100 turns on the 60 mm diameter mandrel). 



Parameters item 


Wavelength 
(nm) 


Typical 
value 


Dispersion (ps/km/nm) 


1460 


3.465 


1550 


9.569 


1625 


14.324 


Dispersion Slope(ps/km/nm 2 ) 


1550 


0.0648 


Zero dispersion wavelength (nm) 


- 


1413 


Attenuation (dB/km) 


1550 


0.210 


MFD (jim) 


1550 


10.2 


A e ff (nm 2 ) 


1550 


105.6 


A e ff x dispersion 


1550 


1010.5 


RDS (nm-i) 


1550 


0.0068 


PMD(ps/km°5) 


1550 


0.04 


Macro bending loss for 100 turns on the 60mm 
diameter (dB/km) 


1550 


0.006 


1625 


0.015 



Table 7. Optical characteristics of the fabricated fiber 

From the table7 we can see that the zero dispersion wavelength is below 1430 nm, the 
dispersion at 1460, 1550 and 1625 nm are 3.465, 9.569 and 12.324 ps/km/nm, respectively. 
Therefore, this fiber not only can be used at the conventional C band for transmission link, 
but also can be suited for S-band and L-band. With the progress being made in the practical 
application of the Raman amplifier, this fiber also can be applicable in transmission links 
using distributed Raman amplifier in the future. Furthermore, the value of Aeff * dispersion 
is also large enough to suppress the dispersion-related non-linear effects in the transmission 
system [35]. It is obvious that a large effective area fiber with non-zero dispersion about 4 
ps/km/nm at 1.55 (im wavelength band is a good approach to avoid four-wave mixing 
(FWM) effect, which in turn enhances the performance of wavelength division multiplexing 
(WDM) system [31]. However, such large effective area fibers have relatively large 
dispersion slope which also restricts the numbers of different wavelength signals [21, 36-38]. 
Therefore, considerable efforts are being made to reduce the dispersion slope of such fibers 
to deal with a rapid progress of DWDM system [39,40]. The refractive index profile of the 
fiber is shown in Fig. 23. This profile has been named RI type in the literatures. 
In order to obtain the flat modal field over the entire central dip region, the effective index (n e ff) 
of the mode should be equal to the refractive index of the central dip (i.e., n e ti = Mi) [15]. The 
values of various parameters used in design of the dispersion characteristics are tabulated in 
Table 8. Here, the relative index difference Ai is given by Ai= (ni 2 -n4 2 )/2ni 2 , i=l, 2, 3. 
The values of the MFD and A e ff associated with the modal field of the proposed design are 
8.3 um and 56.1 urn 2 , respectively. The total dispersion coefficient (D), which includes both 
waveguide and material dispersion of the fiber, is calculated in the wavelength range of 
1530 to 1610 nm, which covers the entire C- and L-bands of erbium doped fiber amplifiers. 
In order to study the tolerance of the various characteristics of the proposed fiber design, we 
have randomly changed the values of thickness and A of each region by 1 % . The actual and 
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the corresponding perturbed refractive index profiles are shown schematically in Fig. 23 by 
the solid and dashed curves, respectively. Figure 24 shows the variation of the total 
dispersion (D) with wavelength. The solid and dashed curves correspond to the actual and 
the perturbed refractive index profiles, respectively. This figure indicates that over the entire 
wavelength range of 1530 to 1610 ran, the dispersion value, which is within 2.6-3.4 
ps/km/nm, for both profiles are within the appropriate range (2-4 ps/krn/nm) needed to 
avoid four wave mixing (FWM) [41]. 
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Fig. 23. Schematic of the refractive index profiles of the proposed fiber (solid curve). The 
dashed curve corresponds to the perturbed refractive index profile. 



a(um) 


b(jim) 


d(jim) 


Ai(%) 


A 2 (%) 


A 3 (%) 


1.0 


3.1 


6.5 


0.03 


0.48 


-0.20 



Table 8. The values of various parameters used in design 
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Fig. 24. Variation of the total dispersion (D) as a function wavelength. The solid and dashed 
curves correspond to the proposed and the perturbed refractive index profiles (shown 
schematically in Fig. 23), respectively. 
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The dispersion and dispersion values of the designed fiber at Ao = 1550 nm is 3.0 ps/km/nm 
and 0.01 ps/km/nm 2 , respectively. The maximum value of dispersion slope is 0.015 and 0.014 
ps/km/nm 2 for the actual and perturbed refractive index profiles over the entire wavelength 
range of 1530 to 1610 nm. This shows that the perturbation in the refractive index profile 
does not make much change in the dispersion slope. 

9. Dispersion Flattened Fibers (DFFs) 

The dispersion value becomes larger by the wavelength increasing in the conventional 
optical fibers. So owing to the dissimilar broadening for different channels, the multi- 
channel application realization would be hard. A suitable optical fiber should meet the small 
dispersion as well as the small dispersion slope in the predefined wavelength interval [42]. 
The concept of providing the attractive option of low dispersion over a range of 
wavelengths was first suggested by Kawakami and Nishida in 1974 [43, 44]. They proposed 
the original "W" fiber structure and explained the importance of a relatively narrow 
depressed cladding region in modifying the waveguide dispersion to give a curve which 
turned over to give two wavelengths for zero dispersion [45]. To minimize pulse broadening 
in an optical fiber, the chromatic dispersion should be low over the wavelength range used. 
A fiber in which the chromatic dispersion is low over a broad wavelength range is called a 
dispersion-flattened fiber. The rms value, or the function j; to be minimized is: 

where C is the chromatic dispersion. A normalized W profile is given by 

(14) 

where Ni > 1 and Nz < 1. The constraint that the first higher-order mode should appear 
exactly at 1.25 um is imposed. Thus there are four variables, namely, (Ni, Nz, b, a), and one 
constraint. Assume that Ni and N2 are given certain fixed values. The values Ni = 1.02 and 
N2 - 0.99 will prove to be interesting. If b = a, then the W profile has degenerated into a step- 
index profile. The core radius, a, of this step-index fiber is easily calculated with the exact 
cutoff condition V = 2.405, where V is the normalized frequency. The value Ni = 1.02 yields b 
= a = 1.64um. Direct numerical calculation shows that if the outer radius, a, is increased then 
the inner radius, b, must also be increased to keep the cutoff wavelength at 1.25 um. Hence 
the constraint X c = 1-25 um corresponds to a curved line in the a-b plane. The rms value of 
the chromatic dispersion along this line is given in Fig. 25. The point of minimum dispersion 
is easily located. 

This procedure is repeated for different combinations of Ni and N2, and the result is given in 
Table 9. The first column of this table, i.e., N2 = 1, corresponds to step-index profiles. 
According to Table 9, the global minimum is 0.9 ps/km /nm, and the corresponding 
optimal W fiber is (Ni, Nz, b, a) = (1.02, 0.99, 1.91, and 2.85 um); see Fig. 26. It should be 
observed that the global minimum is flat; i.e., there is a valley in Table 9 giving roughly the 
same rms dispersion. Another observation is that the dependence of N2 in Table 9 is weak if 
Ni is less than 1.01. On the other hand, the dependence of N2 is strong if Ni is greater than 
1.01 and N2 is close to unity. 
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Fig. 25. The rms value of the chromatic dispersion over the vacuum wavelength range (1.25 
um, 1.60 um) as a function of the outer radius a in a W fiber. The cutoff vacuum wavelength 
is 1.25 um. The relative refractive-index increases in the core and in the inner cladding are 
1.02 and 0.99, respectively. 
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Table 9. Minimum rms chromatic dispersion (ps/km/nm) for different doping level in the 
core & cladding 
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Fig. 26. Chromatic dispersion for the optimal W fiber (N h N 2 , b, a) = (1.02, 0.99, 1.91 um, 2.85 
um). The rms value of the chromatic dispersion over the vacuum wavelength range (1.25um, 
1.60 um) is equal to 0.9 ps/km/ nm. The cutoff vacuum wavelength is 1.25 um. 
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An exhaustive method for calculating the minimum rms chromatic dispersion in W fibers 
has been presented [45]. The procedure is to generate all W fibers with a certain cutoff 
wavelength and then find the minimum rms dispersion by one-dimensional minimization 
followed by direct inspection. It was found, in the case investigated, that the W fiber is 
capable of dispersion flattening only if high doping levels are used. Dispersion and its slope 
are responsible on the pulse broadening [4]. So we believe that mixing and gathering these 
parameters on the design procedure would lead us to a fiber with exciting performances. In 
other words, we concentrate on dispersion and slope simultaneously to achieve the small 
dispersion and its slope in the predefined wavelength interval, the band which we want to 
have flat dispersion behavior [42]. We use WH-type optical fiber structure which its 
refractive index profile is shown in Fig. 27. 
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Fig. 27. The index of refraction profile for the proposed structures (WII) 

Also, for easy handling of the problem the following optical parameters are defined as 
follows. 



R, 
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For this structure the geometrical parameters are introduced in the following. 

„ b 



,Q 



(15) 



(16) 



To achieve the flattening purpose, we have defined a weighted fitness function. Equation 
(17) shows our proposal for the weighted fitness function of the un-chirped pulse 
broadening factor. 



*=!« 



(*■-**? 



Z[1+{ mi f+i mi r \ 



a- 



(17) 



It is useful to say that at first we applied this fitness function to design dispersion shifted 
fiber. But outcomes presentation will show that this function is appropriate in Flattening 
application too [42]. By using this fitness function, the zero dispersion wavelength can be 
shifted to the central wavelength (\o). Since, the weight of the pulse broadening factor at XO 
is greater than others in the weighted fitness function; it is more likely to find the zero 
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dispersion wavelength at Xo compared to the other wavelengths. In the meantime, the 
flattening of the dispersion curve is controlled by Gaussian parameter (o). To put it in 
another way, the weighting Gaussian function becomes broader in the predefined 
wavelength interval by increasing the Gaussian parameter (o). As a result, the effect of the 
pulse broadening factor with greater value is regarded in different wavelengths, which 
causes a considerable decrease in the dispersion slope in the interval. Consequently, the zero 
dispersion wavelength and dispersion slope can be tuned by Xo and a, respectively. The 
wavelength and the distance durations for the design are defined as follows: 1.50 p,m < X < 
1.60 |im; < z < 200 km. In the simulations un-chirped initial pulse with 5 ps as full-width at 
half -maximum is used. From Figure 28, it is clear that the zero dispersion wavelength is 
successfully set at Xo which is equal to 1.55 (im. Furthermore, the dispersion curve becomes 
so flat by adding the Gaussian weighting term to the fitness function. In other words, in the 
absence of weighting function, the optimized dispersion has higher slope compared to its 
presence. 




1.6 1.7 1.1 

wavelengthg(um) 



Fig. 28. Dispersion versus Wavelength with and without weighting function 

The impact of sigma parameter on the dispersion and its slope is illustrated in Figures 29 
and 30. It is obvious that the dispersion value reduces by the sigma parameter increase in 
the predefined wavelength interval and the curve is the smoothest in the highest sigma case. 
This event can be described based on the fact that the weighting function (Gaussian 
function) has large values around the central wavelength by the increase in the Gaussian 
parameter. So, a large band of the wavelength around the central wavelength has almost the 
same chance for optimization and thus the dispersion will be small and uniform for this 
band. In other words, the duration of this band can be controlled by the sigma parameter. 
The dispersion slope is strongly affected by the presence of o in such a manner that its 
increase has the considerable influence on the dispersion slope and decreases it obviously. 
This result is easily visible in Figure 30 which shows the dispersion slope versus wavelength 
with variance of the o as a parameter. According to the presented weighted function based 
GA optimization, the following optical and geometrical parameters are obtained. We find 
out that the optimal value of R2 for all Gaussian parameters are near to -1. 
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Fig. 29. Dispersion versus Wavelength for different Gaussian parameter 




1.6 1.7 1.! 

wavelength(um) 



Fig. 30. Dispersion Slope versus Wavelength for different Gaussian parameter 



type 


afjjrri) 


P 


Q 


Ri 


#2 


A 


a = .00 


2.5462 


0.6942 


0.3046 


7.0897 


-0.9517 


4.886e-3 


tx = 1.2256e-8 


2.4450 


0.7189 


0.4049 


3.0497 


-0.9854 


8.159e-3 


a = 2.7869e-8 


2.5763 


0.8478 


0.4774 


2.2437 


-0.9877 


7.178e-3 


o- = 3.6935e-8 


2.4374 


0.8461 


0.4098 


1.7966 


-0.7076 


8.064e-3 


<x = 4.9467e-8 


2.5914 


0.8942 


0.4728 


2.2583 


-0.9780 


6.812e-3 



Table 10. Optimal Values for the Optical and Geometrical Parameters 
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In tables 11 and 12 the simulated numerical values of dispersion and dispersion slope based 
on the presented algorithm for wavelength duration in [1.5 - 1.6] fjm are given. Also, 
dispersion and dispersion slope difference for this band are presented for each case. For 
these cases, we show that there is about 6 times difference between traditional optimization 
and weighted function based optimization in dispersion case. 



type 


D(A = 1.5 fim) 
ps / km 1 nm 


D(A = 1.55jum) 
ps / km / nm 


D(/L = 1.6/mi) 
ps / km 1 nm 


AD = D(1.6)-D(1.5) 
ps / km / nm 


a = .00 


-2.039 


-0.032e-3 


2.002 


4.041 


<r = 1.2256e-8 


-0.899 


0.0238 


0.765 


1.664 


cr = 2.7869e-8 


-0.618 


0.0032 


0.363 


0.981 


<r = 3.6935e-8 


-0.606 


0.0051 


0.346 


0.952 


<r = 4.9467e-8 


-0.510 


0.0034 


0.191 


0.701 



Table 11. Simulated Numerical Results for Dispersion for wavelength duration [1.5 -1.6]jum 



type 


S(/l = 1.5//m) 
ps / km 1 nm 2 


S(A = 1.55jum) 
ps 1 km 1 nm 1 


S(A = 1.6/#w) 

ps / km 1 nm 1 


<x = 0.00 


0.0416 


0.0402 


0.0401 


a = 1.2256e-8 


0.0206 


0.0164 


0.0133 


<r = 2.7869e-8 


0.0153 


0.0096 


0.0488 


c7 = 3.6935e-8 


0.0152 


0.0093 


0.0044 


o- = 4.9467e-8 


0.0137 


0.0068 


0.0007 



Table 12. Simulated Numerical Results for Dispersion Slope for wavelength duration 
[1.5 -1.6] /um 

As a final result of these simulations, we should point out that for zero value of the 
Gaussian parameter zero-dispersion wavelength has high accuracy compared to nonzero- 
values of the Gaussian parameters cases. By comparing presented results and the ones 
demonstrated earlier as a dispersion flattened optical fiber, it is clear that the least design 
has considerable band width, the band between to zero dispersion wavelength. Moreover, 
the dispersion value tolerance in this interval is so small which is a direct result of its small 
slope. 



10. Conclusion 

In this chapter some special fiber structures for covering broadband optical fiber 
communications were reviewed. For these three cases R, W and M two different types for 
each of them were considered and discussed in detail. We have been shown that using the 
proposed design method in this chapter systematic approach for broadband applications 
can be found and considering the fibers in this chapter ultra broadband communications are 
available. 



138 Advances in Solid State Circuits Technologies 

1 1 . References 

[I] kazumasa Ohsono, Tomoyuki Nishio, Takahiro Yamazaki, Tomomi Onose, and Kotaro 

Tan, "Low Non-linear Dispersion-shifted Fiber for DWDM Transmission", 

HITACHI CABLE REVIW, Vol. 19, 2000. 
[2] J. A. Baghdadi, A. Safaai-Jazi, and H.T. Hattori, "Optical fibers with low nonlinearity and 

low polarization-mode dispersion for terabit communications," Optics & Laser 

Technology, Vol. 33, pp. 285-291, 2001. 
[3] S. Makouei, M. savadi-Oskouei, A. Rostami, and Z.D.K. Kanani, "DISPERSION 

FLATTENED OPTICAL FIBER DESIGN FOR LARGE BANDWIDTH AND HIGH- 
SPEED OPTICAL COMMUNICATIONS USING OPTIMIZATION TECHNIQUE", 

Progress In Electromagnetics Research B, Vol. 13, pp. 21-40, 2009. 
[4] Govind P. Agrawal, "Fiber-Optic Communication Systems", (John Wiley & Sons, Third 

Edition), 2002. 
[5] T. Kato, M. Hirano, A. Tada, K. Fukuada, T. Fujii, T. Ooishi, Y. Yokoyama, M. Yoshida, 

and M. Onishi, "Dispersion flattened transmission line consisting of wide-band 

non-zero dispersion shifted fiber and dispersion compensating fiber module", 

Optical Fiber Technology, Vol. 8, pp. 231-239, 2002. 
[6] R. Tewari, B.P. Pal, and U.K. Das, "Dispersion shifted dual shape core fibers: 

Optimization based on spot size definitions" , Ligthwave Technology, vol. 10, pp. 

1-5, 1992. 
[7]Dipankar Ghosh, Debashri Ghosh, and Mousumi Basu, "Designing a graded index 

depressed clad non-zero dispersion shifted optical fiber for wide band transmission 

system", Optik Optics, vol. 119, pp. 63-68, 2008. 
[8] B. Mikkelsen, G. Raybon, B. Zhu, R.J. Essiambre, P.G. Bernasconi, K. Dreyer, L.W. Stulz, 

S.N. Knudsen, " High spectral efficiency (0.53 bit/s/Hz) WDM transmission of 160 

Gb/s per wavelength over 400 km of Fiber", Technical Digest of OFC 2001, 2001, 

Paper ThF2. 
[9] T. Ito, K. Fukuchi, K. Sekiya, D. Ogasawara, R. Ohhira, T. Ono, " 6.4 Tb/s (160 x 40 Gb/s) 

WDM transmission experiment with 0.8 bit/s/Hz spectral efficiency" , ECOC 2000, 

2000, Postdeadline paper PD1.1. 
[10] J. Kani, K. Hattori, M. Jinno, T. Kanamori, K. Oguchi, " Tripple-wavelength-band WDM 

transmission over cascaded dispersion-shifted fibers", Technical Digest of OAA'99, 

1999, Paper WC2. 

[II] K. Fukuchi, T. Kasamatsu, M. Morie, R. Ohhira, T. Ito, K. Sekiya, D. Ogasawara, T. Ono, 

"10.92-Tb/s (273 x 40-Gb/s) triple-band/ ultra-dense WDM optical-repeatered 

transmission experiment", OFC 2001, Postdeadline paper PD24, 2001. 
[12] A. Naka et al., Lightwave Technol., vol.12, no.2, February (1994) pp. 280-287 
[13] Y. Liu et al., OFC96, WK15, 1996. 
[15] R.K. Varshney, A.K. Ghatak, I.C. Goyal, and C. siny Antony, "Design of a flat field fiber 

with very small dispersion slope," Optical Fiber Technology, vol. 9, pp. 189-198, 

2003. 
[16] J. Sakamoto, J. Kani, M. Jinno, S. Aisawa, M. Fukui, M. Yamada, K. Oguchi, "Wide 

wavelength band (1535-1560 nm and 1574-1600 nm), 28 x 10 Gbit/s WDM 

transmission over 320 km dispersion shifted fiber", Electron. Letter, vol. 34, pp. 

392-394, 1998. 



A Novel Multiclad Single Mode Optical Fibers for Broadband Optical Networks 139 

[17] S. Yashida, S. Kuwano, K. Iwashita, "10 Gbit/s x 10 channel WDM transmission 

experiment over 1200 km with repeater spacing of 100 km without gain 

equalization or pre-emphasis", Optical Fiber Communication (OFC) 96, San Jose, 

CA, TuD6, 1996. 
[18] A.R. Chraplyvy, "Limitation on lightwave communications imposed by optical fiber 

nonlinearities", Lightwave Technology, vol. 8, pp. 1548-1557, 1990. 
[19] R.W. Tkach, A.R. Chraplyvy, F. Fabrizio, A.H. Gnauck, R.M. Derosier, "Four photon 

mixing and high speed WDM systems", Lightwave Technology, vol. 13, pp. 841- 

849,1995. 
[20] Y. Akasaka, "New optical fibers for high bit rate and high capacity transmission", SPIE 

Proc, Vol. 3666, pp. 23-29, 1999. 
[21] Y. Liu, A.J. Antos, VA. Bhagavatula, MA. Newhouse," Single mode dispersion shifted 

fiber with effective area larger than 80 urn 2 and good bending performance", Proc. 

of ECOC'95, TuL2.4, 1995. 
[22] M. Tateda, Y. Kato, S. Seikai, and N. Uchida, "Design consideration on single mode 

fiber parameters," Pans. IECE Japan, (in Japanese) vol. J56-B, pp. 324-331, 1982. 
[23] K.I. Kitayama, Y. Kato, M. Ohashi, Y. Ishida, and N. Uchida, "Design considerations for 

the structural optimization of a single mode fiber," Ligthwave Technology, vol. 1, 

pp. 363-369, 1983. 
[24] M. Savadi-Oskouei, S. Makouei, A. Rostami, and Z.D. Koozeh Kanani, "Proposal for 

optical fiber designs with ultrahigh effective area and small bending loss applicable 

to long haul communications", applied Optics, vol. 46, pp. 6330-6339, 2007. 
[25] Y. Namihira, "Relationship between nonlinear effective area and mode-diameter for 

dispersion shifted fiber," Electron. Letter, vol. 30, pp. 262-264, 1994. 
[26] M. Hautakorpi and M. Kaivola, "Modal analysis of M-type dielectric-profile optical 

fibers in the weakly guiding approximation," Optical Society of America A., vol. 22, 

pp. 1163-1169, 2005. 
[27] F. D. Nunes and C. A. de Souza Melo, "Theoretical study of coaxial fibers," Applied 

Optics, vol. 35, pp. 388-398, 1996. 
[28] D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning 

(Addison-Wesley, 1989). 
[29] X. Tian and X. Zhang, " Dispersion flattened design of large effective area single mode 

fibers with ring index profiles", Optics Communications, vol. 230, pp. 105-113, 

2004. 
[30] Z. O. Tseng, "On the Convergence of the Coordinate Descent Method for Convex 

Differentiable Minimization", Optimization Theory and Applications, vol. 72, pp. 7- 

35, 1992. 
[31] S. Yin, K.w. Chung, H. Liu, P. Kurtz, and K. Reichard, " A new design for non-zero 

dispersion shifted fiber (NZ-DSF) with a large effective area over 100p,m 2 and low 

bending and splice loss", Optics Communications, vol. 177, pp. 225-232, 2000. 
[32] X. Jiang and R. Wang, "Non-zero dispersion shifted optical fiber with ultra large 

effective area and low dispersion slope for terabit communication systems", optics 

Communications, vol. 236, pp. 69-74, 2004. 
[33] B.H. Kim, Y. Park, D.Y. Kim, U.C. Paek, W.-T. Han, in: OFC_2002, Technical Digest, 

2002, pp. 173-174. 
[34] U.C. Pack, C.R. Kurkjian, J. Am. Ceram. Soc. 58 (1975)330. 



140 Advances in Solid State Circuits Technologies 

[35] S. Matsuo, S. Tanigawa, K. Himeno, K. Harada, in: OFC_2002, Technical Digest, 2002, 

pp. 329-330. 
[36] A. Safaai-Jazi, "Large effective area fibers: propagation properties and optimum 

designs", SPIE Proc, Vol. 3666, pp. 30-39, 1999. 
[37] P. Nouchi, P. Sansonetti, J. Von Wirth, and C. Le Sergent," New dispersion shifted fiber 

with effective area larger than 90 urn 2 ", Proc. of ECOC'96, MoB3.2, 1996. 
[38] V. Silva, "A new design for dispersion shifted fiber with an effective area larger than 

100 um2 and good bending characteristics," Proc. of OFC'98, ThKl, p. 301, 1998. 
[39] N. Kumano, K. Mukasa, S. Matsushita, T. Yagi, "Zero dispersion slope NZ-DSF with 

ultra wide bandwidth over 300 nm", Proc. of ECOC'2002, 2002. 
[40] B. Zhu, L.E. Nelson, L. Leng, S. Stulz, S. Knudsen, D. Peckham, "1.6 Tbits/s (40 x 42.7 

Gbit/s) WDM transmission over 2400 Km of fiber with 100 Km dispersion managed 

spans", Electron. Letters, vol. 38, pp. 647-648, 2002. 
[41] P. Nouchi, P. Sansonetti, J. Von Wirth, C. Le Sergent," New dispersion shifted fiber with 

effective area larger than 90 um2", Proc. of ECOC'96, MoB3.2, 1996. 
[42] M. Savadi-Oskouei, A. Rostami, and S. Makouei," A novel fiber design strategy for 

simultaneously introducing ultra small dispersion and dispersion slope using 

genetic algorithm", European Transaction on Telecommunications, vol. 20, pp. 37- 

47, 2009. 
[43] S. Kawakami and S. Nishida, "Anomalous dispersion of new doubly clad optical 

fibers," Electron Letters, vol. 10, pp. 38-40, 1974. 
[44] S. Kawakami, S. Nishida, and M. Sumi, "Transmission characteristics of W-type optical 

fibers", Proc. Inst. Elec. Eng., vol. 123, pp. 586-590, 1976. 
[45] B. James and C.R. Day, "A review of single mode fibers with modified dispersion 

characteristics", Lightwave Technology, vol. 4, pp. 967-979, 1986. 



8 



Continuous-Time Analog Filtering: Design 

Strategies and Programmability in CMOS 

Technologies for VHF Applications 

Aranzazu Otin, Santiago Celma and Concepcion Aldea 
Group of Electronic Design, Aragon Institute for Engineering Research (I3A). 

Zaragoza University, Zaragoza. 
Spain 



1. Introduction 

The evolution of wireless applications (the performance as well as the number of users) has 
undergone explosive growth in the last years, resulting in an increasing demand for smaller, 
low-cost wireless transceivers with low power consumption. In order to meet this demand, 
continuous development must take place both in CMOS technology and in RF electronics, 
the goal of which should be to achieve a fully-integrated single-chip receiver in a low-cost 
CMOS process. This demand for complex read channel and multi-standard receiver ICs calls 
for the design and implementation of one category of analog interface chips as continuous- 
time (CT) filters, suitable for high speed with variable bandwidths over a wide frequency 
range, preferably using the G m -C approach rather than other existing solutions. 
Filters based on the G m -C technique were used quite early on with bipolar technology and 
they have now become the dominant option to implement monolithic filters for very high 
frequency. The basic building block of a G m -C filter is the integrator, which involves the use 
of transconductors and capacitors only and whose structure is therefore simpler than others, 
such as operational amplifiers. The simplicity of the transconductor coupled with the open- 
loop operation, which does not involve any complex frequency compensation schemes, 
point to this cell as the basic active element to be considered and the best option to operate 
in a VHF range with low supply voltages. 




Fig. 1. Ideal transconductor Vj n to I converter of transconductance g m (conversion factor). 

All the benefits of the G m -C approach lie in the ideal behaviour of the transconductor. 
Nevertheless, its use as the basic element in the VHF active filter implementation forces one 
to consider some drawbacks related with the non-idealities of this fundamental cell: finite 
output resistance, finite bandwidth, noise, non-linearity, etc. The main disadvantages 
inherent to this technique are its high sensitivity to parasitic capacitors and the non-linear 
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behaviour of the transconductor, to the extent of appearing a distortion brought about 
mainly by the non-linearities generated in the V-I conversion. Certain specific strategies 
require to be used to minimize these effects. 

By taking differential or balanced transconductor structures into account, distortion is 
reduced (even non-linear components are cancelled) and better immunity to common-mode 
noise is obtained. Furthermore, the use of tuning techniques compensates parameter 
deviations due to process and temperature variations. These ideas, together with a careful 
layout, a detailed study of the technology and a deep analysis of the device, lead to an 
improvement in the transconductor behaviour, and consequently, in the filter performance. 
Thus, while developing the design of an active G m -C filter, the effects of transconductor 
non-idealities must be analysed in depth to achieve optimum filter performance. The 
implementation of the transconductor should show a trade-off between dc-gain, linearity 
and low phase-error at the cut-off frequency. 

Any pole or zero frequency in filters based on the G m -C technique is of the G m /C type. This 
means that there are two fundamental ways of programming the frequency response of the 
filter: keeping G m constant and varying C, or vice versa. The choice of filter approach will 
affect noise and power dissipation (Pavan et al., 2000). The constant-C approach has the 
advantage of maintaining the noise specifications constant over the entire programming 
range while decreasing the power consumption for lower frequencies. Due to the above 
considerations, the constant-C scaling technique is the preferred approach for implementing 
filters operating in a very high frequency range, focusing on the design of tunable CMOS 
transconductors. On the other hand, discrete tuning is currently being more widely used 
than continuous tuning, both to preserve the dynamic range and take advantage of the 
digital system in mixed design to determine the control signal that calibrates and 
reconfigures the filter. A possible discrete tuning technique is based on a parallel connection 
of transconductors, where the desired time-constant can be digitally programmed (Pavan et 
al., 2000a). This approach succeeds in keeping the Q-f actor constant and maintains an 
adequate dynamic range over the entire bandwidth setting. 

The target is to implement a transconductor that is compatible with the latest low-cost pure 
digital CMOS technologies and programmable over a very high frequency range while 
maintaining an adequate dynamic range (DR). The concrete values of these specifications 
depend on each particular application. This work does not focus on a concrete application 
but on carrying out an overall analysis to seek the structure that provides the best trade-off 
between operation frequency, programmability, dynamic range and power consumption. 
Considering all these points, an optimal solution for digitally programmable analog filters in 
the VHF/UHF range is to take advantage of current- mode pseudo-differential topologies and 
endow them with digital programmability. The design strategy is therefore as follows: after 
analysing the transconductor parameters that limit its ideal behaviour, a very well-known 
current-mode topology (Smith et al., 1996; Zele et al., 1996) will be characterized; starting from 
the Zele-Smith architecture, two different transconductors will be presented and in-depth 
analysis will be carried out, following which all the characteristic parameters of each active cell 
will be obtained; programmability will then be added to the VHF transconductors and the 
experimental results of a low-cost 0.35 |im CMOS implementation will be presented. As the 
active cell is based on a classical structure, a broad diversity of digitally programmable and 
continuously tunable CT filters can be obtained, where the programmability exhibited by the 
filter is achieved due to the design of a generic programmable transconductor. Due to the lack 
of special capacitor structures in standard digital technologies, the use of the MOS structure as 



Continuous-Time Analog Filtering: Design Strategies and Programmability 

in CMOS Technologies for VHF Applications 143 

an intended passive device is probably as old as the MOS transistor concept itself. An 
alternative to implementing linear capacitors is to use the gate-to-channel capacitance of 
MOSFET devices as capacitors, where the gate-oxide thickness is a well-controlled variable in 
the process. This option will be considered in this work. 

Therefore, in this chapter we will show the best way to implement key analog building 
blocks of a high-speed system in a CMOS technology with a wide programmable frequency 
range; considering new design techniques and uncovering potential problems associated 
with the design of high-speed analog circuits using short-channel and low-voltage devices. 
These are the challenges of CMOS filter design at very high frequencies and this study 
addresses the theoretical and practical problems encountered in the design of robust, 
programmable continuous-time filters with very high bandwidths implemented in low-cost 
digital CMOS technologies. 

2. The Integrator: building-block in the Gm-C technique 

The majority of continuous-time (CT) integrated filters, circuits where high frequency at low 
cost of silicon and power is required, present a frequency response controlled by time- 
constants, and one of the simplest implementations for these factors is taking advantage of 
the integrator structure. Therefore, the integrator is the dominant building block for many 
high-frequency active circuit design techniques, and its frequency response and linearity 
directly determine the filter performance. 

Accordingly, systems based on the G m -C technique are the first option for implementing CT 
filters, thanks to their acceptable performance over the VHF range. The active building 
element used by the G m -C filter approach, based on an open-loop integrator, is the 
transconductor cell (Fig. 1), which ideally delivers an output current proportional to the 
input signal voltage: 

L=g,„V„, (1) 

where g m is the transconductance of the element. When a grounded capacitor is connected 
to the output node of the transconductor in order to take this current out, an integrator is 
obtained leading to Vj n -V conversion, as shown in Fig. 2(a). It turns out that an ideal 
voltage-mode integrator has been obtained with a simple transconductance-capacitor 
combination. Nevertheless, a second structure can be considered taking into account the 
current-mode signal processing, whereby two different, yet completely equivalent, 
topologies are obtained. In this case, the input current is taken across the integration 
capacitance in order to obtain the transconductor input voltage and then, after the active 
cell, the output current. Thus, Fig. 2(b) shows the Ii n -I conversion. 





(a) (b) 

Vs. I 2 

Fig. 2. Ideal integrator; (a) voltage-mode: — - = - -2-^- , (b) current-mode: —2- = -=i. 
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Due to the grounded location of most parasitic capacitors of the active cell (the total 
output/ input node effective parasitic capacitance, depending on the configuration), they 
must be considered by constituting a percentage of the total integration capacitance Q, 
which is particularly significant at high frequencies. An extreme situation can be reached 
when considering the proposed transconductor as an integrator where total integration 
capacitance Ci is constituted only by these parasitic capacitances, with no need for any 
external capacitor. Nevertheless, these capacitances are not linear and, depending on their 
contribution, the total linearity of the system will be affected. As technological process 
variations will also affect the value of these parasitic capacitances, sensitivity to these 
capacitors requires a detailed study of the device models and integration technology 
together with a careful system layout. 

The ideal integrator has an infinite dc-gain and no parasitic effects, thus obtaining a phase of 
-ti/2 for all the frequencies. The unity-gain frequency is (Dt = gm/Ci. Nevertheless, a real 
integrator presents a non-zero transconductor output conductance g ou t and parasitic poles 
and zeros, which distort the transfer function: 

(l-s/ 

H(s) = A DC ) /a X (2) 

1+ s / 



where ADC=gm/gout is the dc-gain and a)i=OD t /ADC = gout/Ci is the frequency of the dominant 
pole. The effects of parasitic poles and zeros at frequencies much higher than the frequency 
range of the transconductor can be modelled with a single effective zero 002: positive 002 
results in an effective parasitic RHP-zero and negative 002 in an LHP-zero. 
Non-zero transconductor output conductance g ou t causes finite dc-gain in real integrators in 
the filter. In addition, parasitic poles and zeros in the integrator transfer function, together 
with finite Ado generate deviations of the inverter integrator phase response from -n/2, and 
it is well-known that phase error is the main source of misf unctions in filters. In particular, 
phase deviations around co t can cause significant errors in the filter transfer, depending on 
filter quality factors. The accuracy of the overall frequency response of the filter depends on 
how closely the individual integrators in the filter follow the ideal response. The filter 
remains very close to the ideal one if the integrator phase at its unity-gain frequency co t is 
equal to its ideal value -n/2; the amount by which the phase at cot deviates from this 
quantity will be called A(p(co t ). 



^( W( ) = _^ + tan- i p- -tan" 1 p- => Ap(« ( )« tan" 1 \&*- -tan- 1 ]-^! (3) 



C0 1 



Low dc-gain causes a leading phase error, and parasitic high-frequency poles and zeros in 
the integrator create lagging (ff>2>0, RHP-zero) or leading (oc>2<0, LHP-zero) phase errors. The 
acceptable worst case value of A(p(oot) depends on the specifications for the high-frequency 
response of the overall filter and the poles and quality factor of the transconductor transfer 
function. The integrator phase error can be modelled with a frequency-dependent integrator 
quality factor Q mt (Nauta, 1993), concluding that a high and accurate filter quality factor 
puts strong constraints on the integrators phase error, i.e. on Qmt. 
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The filter performance is dominated by the performance of the transconductors, since the 
filter specifications (dynamic range, dissipation and chip area) depend not only on filter 
properties (Q, cut-off frequency, impedance level) but also on transconductor properties 
(Ado <0t> coi, 002, noise behaviour, linearity, area and power consumption). It is therefore 
useful to put effort into the study of a high-performance transconductor that would improve 
all its specifications, in order to obtain a proper design for these VHF filter building blocks. 

3. Fully-balanced pseudo-differential transconductor cell 

In this section, the development of a fully-balanced current-mode integrator based on a 
classical structure is described, which is characterized by low-power, high rejection of 
supply noise and VHF potential application. Fig. 3 shows the conceptual scheme of the Zele- 
Smith pseudo-differential integrator (Smith et al., 1996; Zele et al., 1996), a complete fully- 
balanced transconductance cell arranged for using a current-mode integrator. 




Fig. 3. Conceptual scheme of the complete fully-balanced current-mode transconductor. 

To understand the basic operation we analyse the simple first-order model of the proposed 
transconductor, considering each unit cell as a simple transistor, i.e., single common-source 
stages as shown in Fig. 4. Under these conditions, the small-signal analysis gives the 
expression for the differential gain of the integrator (Eq. 5), where gmi is the i-cell 
transconductance and g Q ' is the sum of output conductances gdsi at the input node. 




Fig. 4. Small-signal model for the common-source transconductor stage. 

By analysing this expression and considering a first-order approximation, i.e., neglecting the 
gds effects of each transistor, an infinite dc-gain is achieved if perfect matching is obtained 
between g m i and gm2, so that 5g m =gmi-gm2=0. Nevertheless, the effect of the output 
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conductances is not avoidable and the implementation of a negative resistance (8g m <0), 
inherent to this topology, provides the possibility of achieving dc-gain enhancement. Note 
that by making Sg m +go'— *0, then |Adc|— >c0 - In practice, mismatching between transistors 
limits the differential gain by up to 55 dB at most. Another equivalent way for analysing this 
improvement is to consider the differential-mode input resistance of the transconductor cell. 



\c 



K -i; 



r-i; 



9 A 

R D (in)* = - = ^^ (5) 
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As a result, this scheme shows the basic pseudo-differential structure obtained by 
considering two dual transconductor cells (g m ), leading to current integration through input 
capacitance Q. Thanks to the additional negative resistance shown in grey in the same 
figure, dc-gain is increased by providing positive feedback compensation for the signal 
current and boosting the input resistance of the transconductor. 

The approximate common-mode gain, which must be less than unity to guarantee stability 
in closed-loop configurations, is constrained by device ratios to a stable value over all 
frequencies (Eq. 6). Common-mode stability is assured by designing (g m i + gm2)/gm > l- 
Common-mode behaviour analysis can be also carried out by calculating common-mode 
input resistance. 
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The common-mode feedback resulting from the interconnection of the negative resistance 
provides both a naturally high differential gain and low common-mode gain for the 
integrator, improving these limits attached to a real integrator structure. Consequently, the 
basic operation of the transconductor will be best understood by explaining, first, that the 
common-mode control and dc-enhancement circuitry is connected at the input of the circuit 
and then, that the linear V-I conversion mechanism occurs in the output stage. 
The gain of the basic current integrator is independent of the supply voltage to the first- 
order approximation. When fully-differential current topologies are used, the small 
remaining supply noise feedthrough is common to both sides of the signal and thus has no 
direct effect, except through random device mismatch. Therefore, the integrator has good 
immunity to supply noise. Device mismatch can be minimized with careful layout and 
specific design techniques to around 0.1-1 %, in many applications (Croon et al., 2002; Otin 
et al., 2004; Otin et al., 2005). 

The use of an integrator based on transconductance cells implemented by using single 
transistors (no internal nodes), results in a proper frequency response because the only 
nodes are at the inputs and at the outputs. To a first-order approximation, no parasitic poles 
or zeros exist in the differential ac-response of the basic integrator circuit. Both differential 
and common-mode gains can be independently set by the different values of g m i and g m 2. 
The ideal integrator function is a result of setting Sg m +g o '=0 and the phase error at the unity- 
gain frequency, C0t=gm/Q, can be calculated by: 

A,(«0 = tan- W . tan- f^^A = £ - tan"' L&-J (7) 

1®J { gm 2 { S g,„+goJ 
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To summarize, infinite differential input impedance can be obtained if 8g m +g '— *0 while 
maximizing the differential dc-gain and minimizing the phase error at co t , and common- 
mode input impedance can be reduced by maximizing the sum of the transconductances 
(gmi+gm2)- Consequently, the common-mode rejection ratio (CMRR) is improved. 
Nevertheless, an important concept should be borne in mind: as the dc-gain depends on the 
difference (go- 1 5g m | ), the structure can lead to instability if this quantity becomes negative 
(total negative input conductance) due to overcompensation. 

By analysing the small signal model of each common-source stage forming the complete 
transconductor topology (Fig. 4), the need to solve a frequency problem arises: the 
feedforward ac-current path from the gate (input) to the drain (output), through the overlap 
parasitic capacitance C g d. When considering the stages forming the negative resistance, the 
repercussion of this effect is not important because the contribution to the total behaviour of 
the cell decreases as these capacitances are short-circuited with their respective g m i- 
However, the feedforward current through C g d in the fully-balanced output-stage g m of the 
transconductor structure generates a transmission zero (g m /C g d) in the right complex half- 
plane. This parasitic RHP-zero modifies the integrator frequency response and creates a 
phase-lag at the unity-gain frequency. Furthermore, the Miller effect, also associated with 
this parasitic capacitor, introduces larger equivalent input capacitance (Cm=Cg S +Cgb => 
Cin=Cg S +Cgb+(l+A)Cgd) and an additional component to the equivalent output capacitance 
(C ut = Cbd => C ut=Cbd + (l + A- 1 )Cgd), where A«g m /gd s . Therefore, the neutralization of this 
effect will involve bandwidth enhancement. 

Many methods have been proposed to minimize the Miller effect: the cascode technique 
(Gray et al., 2001), the inductor shunt peaking technique (Mohan et al., 2000), the capacitive 
compensation technique (Wakimoto et al., 1990; Vadipour, 1993), the distributed 
amplification technique (Ahn et al., 2002) and the active inductor technique (Sackinger et al., 
2000). They all have the advantages of low- voltage compatibility and low area; however, the 
solutions considered in this work will be the use of cascode structures together with the 
capacitive compensation technique. 

Differential systems allow the C c -cancellation technique, using positive feedback to generate 
negative capacitances, which can cancel the positive ones to yield bandwidth increases. 
These C c capacitors are the overlap C g d parasitic capacitances of dummy compensation 
transistors used in a cross-coupled way to neutralize the feedback action of these Miller 
capacitors. The connection is between the output and the opposite sign input, available in a 
balanced configuration. Under these conditions, the RHP-zero is moved to infinity, i.e., the 
cause of phase lag is removed, thus expanding the bandwidth of the transconductor. At the 
same time, compensation capacitor C c will cancel the Miller effect and a lower input node 
effective capacitance is obtained due to the reduction of the feedforward effect. This 
technique depends on feeding back a current that is precisely the same as the one flowing 
through the Miller capacitance C g d and, in consequence, the neutralization capacitor must 
match precisely. However, it is remarkable that C g d is voltage-dependent and compensation 
can only work with small signals. In the case of mismatch between C g d and C c , parasitic zero 
is not at infinity and can cause a small phase lag or lead. 

However, this is not the full story of the high frequency behaviour of the transconductor cell 
and there are more frequency limits. Mismatch in common-mode feedback circuits can 
result in unexpected parasitic poles and zeros. In addition, high frequency models of the 
MOS transistor show that g m is not independent of frequency, but has a finite delay g m (s) 
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Fig. 5. Cancellation of transmission zero g m /C g d and neutralization of the Miller effect: C c - 
cancellation technique. 

and begins to roll off at very high frequencies. Although the frequency where this roll-off 
begins can be in the GHz range, the phase shift from this effect can become significant at 
much lower frequencies. Since most active filters are very sensitive to small phase changes 
in the integrator response, it is thus important to take this effect into account. 
The first way to minimize these effects is to eliminate the internal nodes or, if this is not 
possible, to design them as low-impedance nodes. This procedure can be carried out by 
considering cascode topologies. Moreover, their use further prevents bandwidth reduction 
generated by transmission zero because one side of C c is connected to the internal low- 
impedance node, i.e., to the low-gain point of the cascode transistor source. 
Therefore, an enhancement of the integrator dc-gain has been obtained with this topology 
by means of the differential negative resistance, increasing the differential input resistance 
of the transconductor and keeping the common-mode gain lower than unity. With regard to 
frequency limit-related problems, transmission zero has been reduced by using C c -capacitor 
compensation, taking advantage of the pseudo-differential topology. This solution can also 
be improved by considering cascode topologies, giving higher dc-gains in a natural way, 
which will also reduce the frequency drawbacks associated to the internal nodes of other 
topologies by avoiding internal high-impedance nodes in the signal path. 
As a result, a low -voltage transconductor with high linearity, very high operation frequency 
and high power efficiency has been designed where cascode structures should be 
considered to obtain an improvement in the high-frequency behaviour of the basic topology. 
The main advantage of using cascode stages instead of single common-source stages is the 
higher dc-gain while maintaining a good frequency response. Hence, a higher quality factor 
of the integrator is expected due to the higher differential dc-gain (Abidi, 1988). Basic 
cascode circuits require high supply voltages to operate due to the large overhead bias from 
threshold voltages. However, variations of the cascode technique exist which can be used 
with lower voltage supplies. Two options are considered in this work, the so-called high- 
swing cascode (HS) stage and the folded cascode (FC) stage (Baker et al., 1998; Sansen et al. 
1999; Sedra et al., 2004). The unit cells replacing the common-source stages previously used 
are shown in Fig. 6. The complete fully-balanced current-mode transconductance cells 
implemented by using these cascode stages are described in the following sections. 
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Fig. 6. Unit transconductance cell: (a) high-swing (HS) and (b) folded cascode (FC) topology. 

3.1 High-swing cascode section: HS topology 

Fig. 7(a) shows the transconductor arranged for using the current-mode integrator described 
in Fig. 3, where the unit cells have been implemented by using high-swing cascode stages 
(Baker et al., 1998; Sansen et al. 1999; Sedra et al., 2004). As illustrated in the corresponding 
HS unit cell (Fig. 6(a)), current sources are also implemented by using high-swing cascode 
elements. The substrate terminals of NMOS transistors are connected to the reference 
voltage as usual, and those of the PMOS transistors are connected to the corresponding 
source node of each transistor. 

The use of high-swing cascode elements offers as high accuracy as using basic cascode 
stages to implement each unit cell of the transconductor but, because of the slightly different 
connection between transistors, needs lower supply voltage and has fewer internal parasitic 
poles, generating nodes between the input and the output, giving a better frequency 
response of the integrator. The main disadvantage of the improved cascode topology is that 
due to biasing constraints, the gate-source voltages must be kept small, resulting in larger 
devices for a bias current level. 



3.2 Folded cascode section: FC topology 

In order to obtain an improvement in biasing flexibility and further reduction of the supply 
voltage in the design of the transconductor cell, we can also take advantage of folded- 
cascode sections (Sansen et al. 1999; Sedra et al., 2004). The schematic used to describe the 
complete integrator based on the proposed current-mode pseudo-differential 
transconductor is shown in Fig. 7(b), where the unit cells have been implemented by using 
FC stages illustrated in Fig. 6(b). In this case, current sources are implemented by using 
single elements, both bias sources Ibias and cascode sources MNSi- The substrate terminals of 
NMOS transistors are connected to the reference voltage as usual, and those of PMOS 
transistors, both those used to implement current sources Ibias and those implementing 
folded transistors MpFi, are connected to the Vcc node. 

The use of folded cascode elements exhibits a substantial improvement in biasing flexibility, 
because of the increased drain-voltage of the transistors, at the cost of additional current 
sources and bias voltages. Another significant benefit of using these stages is that by 
avoiding the biasing constraints associated to the high-swing cascode structure, we obviate 
the need to keep gate-source voltages low, which results in smaller and simpler devices for a 
given bias current level, lower voltage supply and larger unity-gain frequencies. 
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Fig. 7. Fully-balanced pseudo-differential current-mode cell, based on (a) high-swing 
cascode unit stages: HS transconductor; (b) folded-cascode unit stages: FC transconductor. 



3.3 General considerations: basic principle 

A similar notation and index-linking have been adopted in both transconductor 
implementations in order to simplify the description of the basic principle and to unify the 
topology analysis. Assuming ideal behaviour for the integrator, the balanced input current 
flows entirely into integration capacitance Q. Diode-connected stages My adequately bias 
cascode output TSA$^, whilst transistors M2,s provide positive feedback compensation for the 
signal current flowing into My and boost the input resistance of the integrator. 
Regarding the gain enhancement by means of the negative resistance formed by transistors 
M2,5, this technique is absolutely necessary for Zele-Smith topologies and other voltage- 
mode transconductors (Nauta, 1993), in order to obtain reasonably high dc-gain values. 
Theoretically, the dc-gain could be infinity by adjusting the equivalent negative resistance, 
but in practice mismatching limits the dc-gain by about 40 dB in single transistor stages. 
However, the use of cascode topologies leads to a natural enhancement of this parameter 
and differential dc-gain values of up to 55 dB can be reached with identical transistors under 
identical bias conditions by means of a lower mismatching sensitive design. Nevertheless, 
there is a key difference between the HS and the FC cascode structure: 

• The high output resistance is directly guaranteed thanks to the true cascode output 
stage exhibited by the HS transconductor. Therefore, positive feedback compensation is 
not necessary to boost differential resistance or enhance the dc-gain. 

• On the other hand, the output-node for the FC transconductor is not a very high 
impedance node, and the negative resistance proves necessary to obtain real input 
resistance enhancement. In this approach, positive feedback compensation for the signal 
current flowing into M36 is essential, boosting the input resistance of the integrator and 
increasing dc-gain. 
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The reason for this difference is that the FC unit cell considered and shown in Fig. 6(b) is a 
pseudo-cascode topology. To obtain similar output impedance to that of the HS approach, 
the Mns current source would have to be implemented by using a cascode current source. 
Nevertheless, even if the HS implementation undergoes an immediate dc-gain/ input 
resistance improvement, both topologies require positive feedback compensation (negative 
resistance) to reduce common-mode gain (common-mode input resistance) and stabilize 
common-mode voltages. On the other hand, the use of pseudo-differential structures 
requires careful and efficient control over the common-mode behaviour of the circuit. It is 
worth noting that this structure not only stabilises the common-mode voltage, but also 
rejects common-mode signals by means of partial positive feedback. This idea has already 
been used for high-frequency transconductors in (Nauta, 1993) and for a class of current- 
mode filters (Smith et al, 1996; Zele et al., 1996). 

Thus, considering this topology implemented by using cascode stages, dc-gains of about 
55 dB and CMRR of 60 dB can be obtained with inherent stability of common-mode 
voltages. Note that the propagation of common-mode (CM) signals in balanced circuits can 
cause instability and distortion. Further, current consumption, linearity and 
transconductance value are strongly dependent on the CM input signals. Additional 
techniques can be used in the proposed topology if a greater CMRR is needed, such as 
feedforward cancellation of the input CM signal. Balanced transconductors with high-input 
common-mode rejection that are capable of operating with low-voltage supplies are 
obtained by using an additional transconductor that is only sensitive to CM signals 
(Baschirotto et al., 1994; Wyszynski et al., 1994). Considering this technique, CMRR values 
up to 70 dB could be obtained. 

4. High frequency response 

In this section, the bandwidth of the transconductor will be analysed. Note that if single 
transistor stages and unrealistic simplified models are used in the proposed topology 
(Fig.3), the bandwidth could be infinite owing to the absence of internal nodes influencing 
the transfer function of the integrator. A more complete model of the MOS transistor does 
predict a finite bandwidth due to the second-order frequency effects such as the 
transmission zero associated to overlap parasitic capacitance Cgd, frequency dependence of 
the transconductance g m (s) and mismatch in common-mode feedback circuits. A closer 
explanation of MOS behaviour at high-frequencies (splitting it into an intrinsic and an 
extrinsic part) is required before starting the study of the complete integrator. Taking into 
account a non-quasistatic model (Tsividis, 1996), the high-frequency behaviour of the 
current mode integrator will be calculated. 
When analysing transconductor bandwidth, several general factors must be considered: 

• The output of the complete transconductor may be assumed to be short-circuited for ac- 
signals when calculating the frequency response of the integrator. 

• In all the equations: g m is the transconductance, gds the output conductance, g m b the 
bulk-transconductance and C g d, C gs , Cds, Cb s and Cbd the parasitic capacitances. 

• All the unit cells are designed to seek perfect matching between them. Therefore, all 
similar transistors have the same properties except for transconductance g m of the N- 
transistor processing the signal (Mn transistor in both unit cells shown in Fig. 6). In this 
way, considering the notation and index-linking previously used in Fig. 3: 
gm(Ni)=A N g m (N); gm(N 2 )=A P g m (N); g m (N 3 )=g m (N). Consequently, Sg m =(A N -A P )g m 
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represents the difference between Mi and M2, or, M4 and M5 due to the difference in 
dimensions and bias currents of N-transistors, which gives rise to the negative 
resistance that enhances the dc-gain of the system. 

• Total integration capacitance Q comprises not only the external capacitance, but also 
the contribution of parasitic capacitors (Ci=C e xt + C p ). Intrinsic capacitance C gs is the 
main contribution of these parasitic capacitances C p and consideration of it as a great 
percentage of total integration capacitance acquires great significance. 

• External capacitor C ex t can be implemented by using double-poly, metal-metal or MOS 
capacitors, depending on the technological process. 

• Current source is modelled with a Norton equivalent circuit, where G s and C s are the 
admittance and capacitance components. The external capacitance C ex t connected to the 
transconductor input is in parallel with C s . For purposes of simplicity, from now on, C s 
will include the equivalent capacitance of the current source and the external 
capacitance (C s =C s +C ex t)- Therefore, total integration capacitance can be expressed as: 
Ci=C s +Cp, including parasitic effects, the external capacitance and the equivalent-model 
of the non-ideal current source. 

Firstly, a model for the V-I conversion of the unit cell is derived, in both implementations to 
show the calculation process of the bandwidth of the complete transconductance cell. 



4.1 High-frequency model of the HS unit cell 

The following circuit represents the high frequency model for the HS unit cell previously 
shown in Fig. 6(a), where X(N) denote the parameters associated to the MN-transistor, 
X(NC) those associated to the cascode transistor, X(P) those associated to current sources 
Ibias and X(PC) those associated to the cascode current sources as shown in Fig. 6(a). 
Table 1 summarizes the parameters associated to the impedances shown in the small-signal 
equivalent circuit. The rest of the elements: g m (N), gds(N), gds(NC), g ds (PQ, g m (PC), C gd (N) 
and Cds(NC) directly represent the parameters of the respective transistor. 




Fig. 8. Equivalent high-frequency circuit for the HS unit cell. 
m (NC) + gmt (NC) g(P) 

- gs ( N ) + C A N ) C(x) = C de (N) + C M ( 

PC) + C bd (PC) C 0Ut =C gb (NC), 

C(P) = C gd (P) + C de (P) + C M (P) + C gi (PC) + C gs (PC) 
Table 1. Small-signal parameters for the HS unit cell. 



gM(NC) = g,„(NC) + g,„ b (NC) 


g(P) = g,u(P) 


C„,(N) = C gs (N) + C gb (N) 


C(x) = C ds (N) + C bd (N) + C gs (NC) + C bs (NC) 


C=C ds (PC) + C M (PC) 


C ml =C gb (NC) + C bd (NC) + C gd (PC) 
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4.2 High-frequency model of the FC unit cell 

The following circuit represents the high frequency model for the FC unit cell previously 
shown in Fig. 6(b), where X(N) are the parameters associated to the MN-transistor, X(PF) 
those associated to the folded transistor, X(P) those associated to the current sources Ibias 
and X(NS) those associated to the current source of the folded transistor, which is 
implemented with a single NMOS transistor as previously illustrated. 




g.„IN|V 



Fig. 9. Equivalent high-frequency circuit for the FC unit cell. 

Table 2 summarizes the parameters associated to the impedances shown in the small-signal 
equivalent circuit. The rest of the elements: g m (N), gd s (PF), C gc |(N) and Cd s (PF) directly 
represent the parameters of the respective transistor. 



g = g tl AN) + 2g ds (P) C m (N) = C gs (N) + C gh (N) 

g„,=g*(NS) C=C ds (N) + C bd (N) + 2C gd (P) + 2C ds (P) + 2C bd (P) + C gs (PF) + C bs (PF) 

SmAPF) = UPF) + g*(PF) C mt = C gJ (PF) + C M (PF) + C gd (NS) + C ds (NS) + C bd (NS) 

Table 2. Small-signal parameters for the FC unit cell. 

Great similarity is obtained in the description of both unit cells. Even the FC capacitive 
parameter C will be equivalent to C+C(x) in the HS description. This parallelism will also 
appear in the complete transconductor analysis. 

4.3 High-frequency model of the complete transconductance cell 

Under these conditions, the differential gain of the proposed transconductor cell (in both 
implementations) can be calculated by: 



His) 



(s-s )(s + s 1 ) 



s +x,s+x. 



(s + S^s + S, 



(8) 



where: 



K 



-A DC S 1 S 2 



■;x, 



a 

y' 



a 
J 



— -S^- — - 

Y Y 



(9) 



Denominator factorization of Eq.(8) leads to obtain two parasitic poles, -Si and -82, but only 
if the approximation (a-y/P 2 )<<l is verified. Furthermore, parasitic zero -Si must be 
negligible in the frequency range of interest. Both considerations will be demonstrated, 
either in the HS or in the FC approach (si>>s and si>>so: si(HS)=1600 so=1100 GHz, 
si(FC)=30 s =1900 GHz). 

Due to the use of a pseudo-differential structure, a careful study of the common-mode 
behaviour is mandatory. Thanks to the topology proposed, the common-mode voltage is 
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stabilized by means of partial positive feedback as previously explained. Under these 
assumptions: 

A^k jr^^ ^^t- do) 

s +y 1 s + y 2 s +y 1 s + y 1 



Y _ "-CM J2 . ., _ rCM . , , _ "~CM f-t-t\ 

k cm- / Vi- — >y 2 - — l 11 ) 

s o Ycm Ycm 

Eq.(10) once more shows the need to neglect parasitic zero -si, leading to the same 
consideration as in the differential gain equation. In this analysis, denominator factorization 
is more difficult to accomplish. However, the approximate equations are easily derived and, 
making use of the figures obtained in each particular design, the possibility of using the 
approximate common-mode transfer function will be analysed. Therefore, if 
(occm'Ycm/Pcm 2 ) << 1 is verified, the common-mode gain can be expressed as: 

A CM {s)=K CM - ( :V o) ^ (12) 

c _ a cu _ 3/2 . p _ Pcm g „ Pcm _ , , /io\ 

9i - „ - r b2 - 5i ~ — y-i \ LD ! 

Pcm Vi Ycm Ycm 

Both transfer functions are characterized by two parasitic poles and one RHP-zero; the 
differential-mode by -Si, -82, and so; and the common-mode by -t,\, -£,% and so (zero so is the 
same in both transfer functions) 1 . Consequently, in order to obtain a transconductor design 
that is compatible with all the requirements of the active G m -C filter implementation, a 
proper analysis and characterization of these parasitic elements becomes a top-priority 
challenge. From this study, their origin and frequency location may lead to some design 
considerations to improve the integrator frequency response. 

Differential dc-gain, common-mode dc-gain, so and si are summarized in table 3 for both 
implementations. The parasitic poles can be calculated by using a, (3, y, acM, Pcm, Ycm/ as 
shown in Eqs.(9) and (13). As the resulting relations are very complicated, it is necessary to 
look for the dominant terms and obtain approximate expressions to draw conclusions. They 
can be simplified to analyse and understand the behaviour of the transconductance cell and 
its frequency limits that are associated to second order effects, which differentiate between 
the frequency behaviour of the proposed topology and the expected ideal response. 

4.4 HS transconductance cell frequency response 

Considering the previous study of the HS unit cell, a detailed analysis of the frequency 
response of the complete HS integrator is carried out. The dominant terms of these 
expressions are subsequently obtained in this section. In order to simplify the notation, the 
small-signal parameters are redefined in table 4. 



1 According to stable systems, negative poles -Si, -S2, -^1 and -^2 have been obtained in the 
transfer functions for both implementations. For purposes of simplicity, when referring to 
these poles, their associated frequency (Si, 82, i,\, ^2) will be the considered magnitude. 
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A D c 



HS transconductor 

s _ g-(N) 

° C gd (N) 

, _ g M (NC) + g ds (NC) 
gJN)(g M (NC) + g ds (NC)) 



■ 




FC transconductor 

s _ g«{N) 

° C git (N) 

C ds (PF) 
gJN)(g MP (PF) + gds (PF)) 



I 







a 




a 


Acm 


A CM - 


-gJN)(g M (NC) + g de (NQ) 

a CM 


A CM ~ 


-g,„(N){g MP (PF) + gJPF)) 

a CM 



Table 3. Summary of the high-frequency parameters. 

G = g ds (N) + g M (NC) + g ds (NC) 



r 


= C s +2C i „(N) + C, ul =C s+ 2C gs (N) + 2C gb (N) + C gl ,(NC) + C M (NC) + C gd (PC) 


■1 


C fi =C 0Ul+ C in (N)=C gs (N) + C gb (N) + C gb (NC) + C M (NC) + C gd (PC) 



Table 4. Impedance parameters for the HS integrator. 

By analysing in detail the small-signal equivalent model for the complete HS integrator, the 
value of total integration capacitance Ci can be calculated by Eq.(14). This definition will 
lead to a simplification of the parasitic pole expressions. 



C I =C ;N+ C / ,=C s+ 3C l „(N) + 2C o „ ( 
Firstly, the differential and common-mode gain can be expressed as follows: 



(Av-A>) + 

(A N +A P )- 



2g ds (NQg ds (N) 



gJN){g M (NC) + g ds (NC)) 
2g lls (NC)g ds (N) 



g m (N)(g M (NC) + gis (NQ) 



-1 



A*, + A„ 



(14) 



(15) 



(16) 



In these expressions, the differential negative resistance obtained by the partial positive 
feedback compensation is shown by means of the difference Sg m =(AN-Ap)g m (N). The 
existence of this negative resistance allows the differential dc-gain to be enhanced. Parasitic 
poles Si and §2 can be calculated by means of ratios between a, P and y. Final expressions are 
summarized in table 6. The origin of second order effects can be better understood by 
focusing on their dependence. 

a*{(A N -A p )gJN)){g M (NC) + g ds (NC)) + 2g ds (NC)g ds (N)*2g ds (NC)g ds (N) (17) 
/?«G(C s+ 3C,„(N) + 2C ollf ) = GC I (18) 
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where considering the dominant term in the transconductance G, the expression can be 
written as: 

P«{g ds (N) + g M (NC) + g ds (NC))C lX g M (NC)C, (19) 

Therefore, the parasitic pole 81 can be expressed as: 

s a^^JNQgJN) 

P gM(NC)C, 

Following the same process for the other pole 82, we obtain: 

T « C, (C gi (N) + C ds (NC) + C(x))*C,C(x)*C, C gs (NC) (21) 

s _P .. g _ sAN) + 8M(NC) + g ds (NC) ^ g M (NC) 
2 r~C(X) C ds (N) + C bd (N) + C gs (NC) + C bs (NC)~C gs (NC) 

Similar results can be obtained for the common-mode frequency response: 

«cm *(K +A p )g m (N)){g M (NC) + g ds (NC))*{A N+ A p )g,„(N)g M (NC) (23) 

Pcu~P* g M (NQ C,; Ycm*Y* C, C gs (NC) (24) 

« «cm ., (Aj + A P )g m (N)g M (NC) (A N +A p )g m (N) , /? CM ,^ G ^XmW„« 

c-i = ~ ~ ; c? = « o, » ~ (25) 

Pcm GC, C, r C M C(X) C gs (NC) V 

4.5 FC transconductance cell frequency response 

Considering the previous study of the FC unit cell, a detailed analysis of the frequency 
response of the complete FC integrator is carried out. The dominant terms of these 
expressions are obtained in this section. In order to simplify the notation, two small-signal 
parameters are redefined in table 5. 

G l =G s +2g oul 

C I= C S + 3C,„ (N) + 2C „, = C s + 3C SS (N) + 3C gil (N) + 2C gd (PF) 
+ 2C M (PF) + 2C g ,(NS) + 2C, s (NS) + 2C M (NS) 

Table 5. Impedance parameters for the FC integrator. 

In accordance with the analysis of the small-signal equivalent model for the complete FC 
integrator, parameter Q, defined in table 5, directly represents the total integration 
capacitance, the expression of which is the same as in the HS integrator. Therefore, the total 
integrator capacitance of both integrator implementations can be calculated by Eq.(26). 

C I =C s+ 3C,„(N) + 2C „ f (26) 

For the FC integrator, the differential and common-mode gain can be expressed as follows: 
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■A f )- 



2g ds (NS)(g MP (PF) + 2 gds (P)) 
gJN)(g UP (PF) + gds (PF)) 

x , 2 gds (NS)' 



(A h 



gJN) 



{A N -A„) + 



-1 



2gJNS) 
gAN) 



A N + A P 



(27) 



(28) 



In these expressions, the negative resistance obtained by the partial positive feedback 
compensation is shown again by means of the difference Sg m =(AN-Ap)g m (N). Parasitic poles, 
Si and 82, are obtained by means of ratios among a, f3 and y, as in the HS implementation. 
The final expressions are summarized in table 6. Consequently, parasitic poles 81 and 82 can 
be expressed as: 



a*{(A N -A P )g m (N))(g MP (PF) + g Js (PF)) + 2g ds (NS)(g UP (PF) + 2g Js (P))" 
^2g ds (NS)(g MP (PF) + 2g is (P)) 

P «(* + gJPF) + g MP (PF))C, *{g MP (PF) + 2g ds (P))C 1 

« _ 2g ds (NS) 



S, 



P 



c, 



r «C l (C ds (PF) + C gd (N) + c)«C I C« 

« C, {2C gd (P) + 2C ds (P) + C gs (PF) + C bs (PFJ) « 2 C, C gd (P) 



P 



gM P (PF) + 2g ds (P) 



g M p(PF) + 2g ds (P) 



r 2C gd (P) + 2C ds (P) + C ss (PF) + C bs (PF) 2C gd (P) 

Similar results can be obtained for the common-mode frequency response: 

»cm « {(A N + A P )g m (N))(g MP (PF) + g is (PF)) + 2g is (NS)(g MP (PF) + 2g ds (P)) , 
-(A tJ +A p )g m (N){g MP (PF) + gds (PF)) 

PcM*P*{gM P (PF) + 2g ds (P))C,; /cm »y«2C I C B ,(P) 



^ 



"■CM 
PCM 



(A N +A p )g„,(N) . 
C, 



p -Pcm„! „gMr(PF) + 2g ds (P) 

c, 2 - ~ a 2 ~ 



/cm 



2C S «,(P) 



(29) 

(30) 
(31) 

(32) 

(33) 

(34) 

(35) 
(36) 



4.6 Comments and discussion 

A full analysis has been developed in previous sections in order to draw some design 
strategies and implement a competitive and robust transconductor cell. Great similarity was 
found between the two topologies, as reflected in table 6. 

The first conclusion regards total integration capacitance, Ci, which has the same definition 
in both implementations: Ci=C s +3Q n (N)+2C out (Eq. 26), where C s represents the equivalent 
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Table 6. Parasitic poles and zeros for the integrator topology. 

parallel capacitance between the external capacitance and the Norton capacitance of the 
input current source. This definition can easily be obtained by taking two different ideas 
into account. Considering the proposed integrator topology (Fig. 3), the expected integration 
capacitance will be the external capacitance in addition to the contribution of parasitic 
capacitors to the input node. For example, looking at the positive branch (the upper one) in 
this figure, there are three transconductor cell inputs (g m (l+), g m (2+) and g m (-)) and two 
outputs (gm(l+), g m (2-)) connected to the integrator input and contributing to Q. Each unit 
cell forming the complete integrator has been studied in detail and their corresponding 
models obtained (tables 1 and 2). From this analysis, Qn(N) and C out are the equivalent input 
and output capacitance respectively for each unit cell; which, combined with the previous 
consideration, leads directly to the definition obtained for Ci. 

The low-impedance internal node strategy combined with the use of a negative resistance to 
enhance the differential dc-gain of the system, while increasing the differential input 
resistance and reducing the phase error, suggested the use of cascode topologies to us, also 
taking advantage of their inherent transmission zero reduction. The differential dc-gain 
obtained in both structures, shown in Eq.(15) and (27) respectively, reflects the existence of 
this positive feedback compensation, making the ideal infinite dc-gain of the integrator 
possible. Regarding the common-mode behaviour of the circuit, both implementations are 
described with the same expression complying with the stability requirement | Acm | <1: 
Acm~-1/(A n +A p )~-1/2. 

Variations on the negative resistance value are equivalent to considering a mismatching 
between Mn(1) and Mn(2), which generates a difference between the drain currents of these 
transistors. Furthermore, the same effect can also be achieved by modifying bias currents 
Ibias in these two branches of the circuit. In consequence, the tuning of the value of this 
negative resistance allows for correction in process deviations and mismatching between the 
transistors Mn that process the signal. 

Finally, cascode stages introduce two dominant parasitic poles and a dominant zero as 
shown in Eq.(8), which stem from parasitic capacitances mainly associated with the source 
nodes of cascode transistors. In consequence, an excess phase shift A(p(co t ) takes place at 
unity-gain frequency (Eq. 37). Theoretically, by using the minimum length for cascode 
transistors and minimizing source and drain diffusion areas, the parasitic poles and zero can 
be located further away from the unity-gain frequency. This consideration will also 
minimize distributed-channel effects, which cannot be ignored at high frequencies (Tsividis, 
1999). However, the minimum channel length is conditioned by the required transistor 
matching (Pelgrom et al., 1989). 
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These general considerations reflect the benefits of the topology and its frequency limits. 
Nevertheless, both integrators have been implemented, and the real values drawn from 
these designs allow a better study of the frequencies related to parasitic effects to be carried 
out and a comparison to be made between both topologies. 

Second-order frequency dependence. The previous study for the frequency behaviour of 
the integrator was summarized in Eq.(8), and, taking into account the definition for the 
integrator quality factor (Eq. 4), the general relation in both approximations can be reported 



i _ s g „+g; c gd (N) c p (c) gm (N) m 

Qm g,„(N) C, C, g m (C) 

where g ' is the sum of output conductances at the input node, Ci is the total integration 
capacitance, g m (Q is the transconductance of the cascode element (g m (NC) and g m (PF) for 
the HS and FC approach respectively), and C p (C) the parasitic capacitance at the low- 
impedance node associated to the cascode structures (node X in Figs. 8 and 9). 
The second-order frequency dependence can be analysed when considering the Mn 
transistor mismatch AN-Ap=Sg m /gm and including the frequency dependence for their 
transconductance g m (s) (Eq. 39) in the transfer function (Smith et al., 1996; Zele et al., 1996). 
In this way, the non-quasistatic model for the MOS transistor, required for operation at very 
high frequencies, is included. 

gm{s)= sJm . r jse (39 ) 

1 + sr 5 g m (N) V ; 

By simplifying the result obtained to draw some conclusions, the frequency behaviour of the 
integrator is described (Eq. 40) reflecting the effect of these additional second-order 
dependencies. 

1 J 8/ Y Sg„ 3C gs (N) C gd (N) C p (C) gm(N) 



Q iM U,( N )J 2»(N) 7C, C, C, gJC) 

From this study, a set of parasitic poles and zeros appear at frequencies higher than 15 GHz 
in the proposed designs, the effect of which is considered negligible in a first-order 
approximation. 

5. Noise 

In general, the range of signals that can be accurately driven by electronic devices is limited. 
For low-signals, electrical noise restricts the minimum amplitude that can be processed. 
Noise is considered as all the unwanted electrical signals generated within the device or 
externally and coupled to the output of the system. These signals appear in the system 
whether input signals are applied or not. Noise signals interfere with the incoming signal 
and make it impossible to detect with sufficient quality the signals presenting an amplitude 
comparable to the noise level. Moreover, signals below this level are almost impossible to 
detect. So, noise in the system represents the lowest level for the incoming signal (Silva- 
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Martinez et al., 2003). The origins of noise can be classified as intrinsic and extrinsic 
transistor noise sources, following a similar notation to that used for the parasitic elements 
of the MOS transistor. There are two dominant intrinsic noise sources in CMOS transistors: 
channel thermal noise and 1/f or flicker noise. Extrinsic noise is mainly due to the signals 
produced by the surrounding circuitry and coupled to the device or to the system. The only 
noise considered in this work is noise generated by the transistor. 

The noise of a G m -C filter is caused by the noise output currents of the transconductors and 
the mean square noise currents are generally proportional to the corresponding 
transconductance. Considering the model of a noisy transconductor, the output mean 
square noise current over a certain frequency interval df is given by: 

?„=ikTNFg m df (41) 

where k is the Boltzmann constant, T the absolute temperature and NF a noise factor 
determined by the electronic design of the transconductor. NF=1 corresponds to a 
transconductor output noise current equal to the thermal noise current of a resistor of value 

R-l/gm. 

noise free transconductor 
Sm 1 ^ 5 




Fig. 10. Transconductor noise model. 

To investigate the latter effect in the frequency design of VHF filters, calculating the noise 
factor and its frequency dependence becomes necessary in both HS and FC transconductor 
implementations, because filter noise behaviour not only depend on the filter parameters 
but also on the transconductor noise behaviour. Therefore, the NF-factor appears as a useful 
quantity to translate the transconductor noise properties to the filter noise properties and, 
likewise, the properties of each unit cell to the complete transconductor implementation. In 
consequence, transconductor output noise is determined by the noise properties of the unit 
cell and by the properties of the transconductor topology. 

The transconductor topology analysed is the balanced structure shown in Fig. 3. First, the 
noise behaviour of each single unit cell is calculated (Hhs(s) and Hpc(s)) and then the noise 
factor of the complete transconductor in both implementations, Fhs(s) and Ffc(s). 

5.1 Noise model for the unit cell 

For high and very-high frequencies, the 1/f or flicker noise can be neglected. Assuming only 
thermal noise, the drain-current noise of a single transistor can be written as: 

f n =4kTcg n df (42) 

where k is the Boltzmann constant, T the absolute temperature, g m the small-signal 
transconductance of the transistor, and df the frequency interval over which the noise is 



Continuous-Time Analog Filtering: Design Strategies and Programmability 

in CMOS Technologies for VHF Applications 161 

measured. The constant c value is 2.5 for a short-channel transistor working in the saturation 
region 2 (Tsividis, 1999). The noise output current of the HS unit cell (Fig. 6(a)) is: 

i 2 (HS) = 4kTH HS df (43) 

where: 

H HS =(l-^) 2 c(PC) g JPC) + ac(P)g m (P) + {l-4fi) 2 c(NC)g m (NC) + /3c(N)g m (N) (44) 

^ = SJPC) + gds (PC) r _ 8 JNC ) + U (NC) 

g,APC) + g ds (PC) + g ds (P) g,„(NC) + gds (NC) + gds (N) 

For the FC approach, the noise output current of the unit cell (Fig. 6(b)) can be written as: 

Jf l (FC) = 4kT H FC df (46) 

where: 

H FC = (l - Jaf c(PF) g„,(PF) + la c(P) g„,(P) + c(NS) g m (NS) + a c(N) g,„(N) (47) 
rz_ g m (PF) + g, is (PF) 



gJPF) + gdPF) + 2gdP) + gAN) 



(48) 



As a result, certain conclusions can be drawn from this study. The main noise sources in 
both topologies are transistors Mn processing the signal and bias current sources, to which 
the main contribution applies. The FC unit cell presents a noise factor 2.5 times higher than 
the HS topology, which is due to the additional current sources required in this topology. 

5.2 Noise model for the complete transconductor cell 

The noise factor of the complete transconductor can be calculated using the model obtained 
for the unit cells (Nauta, 1993), or by calculating the complete noise model of the proposed 
transconductance cell from the start. We use this second option to obtain a more accurate 
result, modelling the output noise currents of the transistors with noise current sources in 
parallel with the outputs as in the previous section. The two noise output currents of the 

differential transconductor are z'jj lus and i 2 minus . Following a similar process as in the noise 

study on the unit cells, the noise output current of the complete HS transconductor can be 
written as: 



£*» ( HS ) = C™ (HS) = 4 k T F HS df (49) 



2 The constant c, which models transistor noise behaviour, depends on both its operation 
region and its dimensions. A theoretical derivation for the transistor working in the linear 
region leads to l<c<2; however, c=2/3 for a long-channel device and c=2.5 for a short- 
channel device are obtained in the saturation region (Tsividis, 1999). 
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where: 



F HS =c(PC)F HS (PC) + c(P)F HS (P) + c(NC)F HS (NC) + c(N)F HS (N) 

F HS (PC) = x 1 2 l(A N+ A p -2)g m (N)-(g m (NC) + gds (NC))-x 2 + 
+ 2g ds (P)g ds (PC)x 3 + 2g ds (N)g ds (NC)x 2 ] 2 gJPC) 



F HS (P) = 



FHs(PC)(g m (PC) + g ds (PQ 



gJP) 



F„ S (NC)= g ds (N) 



F HS (N) 



gJPQ{ g ds (P) 

* 2 g m (N)(g m (NC) + g ds (NC))(g m (NC) + g ds (NC) - g ds (N)) 



(50) 
(51) 

(52) 




*/ g m (NQ (53) 



a 



^ 2 {g„,(NC) + gds (NC)fg m (N) (54) 



(55) 



g^ + gJNQ + g.XNC) g lts (P) + g„XPQ + g„s(PQ 

a=x 2 (g m (NC) + g ds (NC))((A N + A p )g m (N)-2g ds (NC)) + 2g ds (NC) + 2g d XP)g ds (PC)x 3 (56) 

The noise output current of the complete transconductor implemented with FC unit cells 
can be written as: 



where: 



'•;, i (fC) = t ims (FC)=4tTF fc «i/ 



F FC = c(PF) F FC (PF) + c(P) F FC (P) + c(NS) F FC (NS) + c(N) F FC (N) 



rjpry-ri-a-^-^j gJPF) ; F FC (P) = 2a*[l -^ 



,(JP) 



F„ (N5) = | 1-^j g m (NS); F rc (N) = 1 1 + ^(A N + A p ) V iS: .„(N) 



(57) 

(58) 
(59) 

(60) 



g„,(PF) + gJPF) 



g m (PF) + g ds (PF) + 2g ds (P) + g ds (N) 



z = 2(l-a)^^- + 2^^1 + a(A N+ A p )(61) 



From this study, the major noise contributions are those due to NMOS transistors in the 
signal path, Mn, representing almost the total contribution for the FC implementation. The 
contribution of the bias current sources, which was relevant in the noise analysis of the unit 
cells, becomes negligible when implementing the complete fully-balanced current-mode 
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topology. A significant benefit of this structure is that the previous noise-related difference 
existing between the two unit cells disappears in the complete transconductor, with the 
same value being obtained mostly for Fhs and Ffc- Another important conclusion is that the 
output noise current is independent of the value of the negative resistance Sgm, inherent 
characteristic of the topology. 

Hence, the noise is dominated by the Mn transistors involved in the V to I conversion, with 
similar noise contributions being achieved for both transconductor implementations. The 
post-layout simulation results corroborate this idea, with a total input-referred noise of 11 
nA rms being obtained for the HS topology and 8 nA rms for the FC approach. This is a very 
important design conclusion and requires be further considered in the design of high- 
performance filters. 

6. Distortion 

The ability to handle large signals with minimum distortion is an important consideration in 
most linear applications. In this section, the large signal characteristics of the current mode 
integrator for differential inputs are analysed using first-order square-law MOSFET models, 
which can provide a sufficiently accurate description of the large signal behaviour for 
design purposes. The proposed transconductor topology (Fig. 3) has been implemented by 
using two different cascode architectures: the HS and the FC approach. Transconductor V-I 
conversion is obtained in the output stage gm, and a negative resistance, shown in grey in 
the same figure, is connected at the input in order to improve the topology characteristics. 
A theoretical distortion analysis including all the non-linearities would be too complex and 
provides no practical benefit. Consequently, a separate study of each factor affecting 
linearity has been developed to obtain simple and efficient design strategies. There are two 
main contributions to transconductor distortion, the first one will be non-linearities in V to I 
conversion, and the second non-linearities in the negative resistance. Each effect is analysed 
separately. The effects of capacitor non-linearities are not taken into account here. 

6.1 Non-linearities in V-I conversion 

For this study, only the output stage will be considered, i.e., the transconductor cell without 
the negative resistance. This differential structure will present good linearity in V-I 
conversion if both paths are perfectly matched. Using a simple model of the MOS transistor, 
the drain currents of the N- and P-channel MOS transistor in strong inversion can be written 

as: 

Considering ViN + =Vc+Vi n /2, Vin =Vc-Vi n /2 and vi n the differential input voltage of the 
transconductor cell, the differential output current can be written as: 

l, = l: - l: = 2Vw*„(N) v„ (63) 

where I bias is the bias current per branch and (3 n (N) the factor associated to the Mn transistor 
in both implementations. The same result has been obtained with the two possible 
transconductor structures, the HS and the FC topologies, leading to the same distortion 
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analysis in both situations. This equation is valid as long as the transistors operate in strong 
inversion and saturation, and show that the differential voltage to current conversion of the 
transconductor is perfectly linear, making use of the square-law and matching properties of 
the MOS transistors. However, mobility reduction will cause deviations from the transistor 
square-law behaviour, generating distortion. The mobility of the NMOS and PMOS 
transistors can be expressed in the first-order approximation as: 

H* . „ = ^ (64) 

Considering again the output stage of the complete cell (Fig. 3), driven balanced around the 
common-mode voltage level Vc as previously explained, the differential output current can 
be calculated as: 

I m «=Lt + -Lt~ ( 65 ) 

Using the previous expressions for mobility reduction and assuming equal transistors 
operating in saturation, the differential output current can be expanded into Taylor series: 

i = k ? ; ex, i 3 o„K„ J_ p ,5 (66) 



I u C W 

where: V, . =V n - V„ = P^ > and k = Mm <" " 



P 

The fifth and higher order distortion terms can be neglected, giving the following as a result 
for distortion calculation: 



I„ i =C 1 B„+C 3 D I „ 3 ; with C 1= ^ 2 y ;C 3 = " °" . - (67) 

™ 3 =^ = -A—, ~*^r (68) 



- 1 8V (1 + V f\l + -0V I 

on \ n on ) I ri n on 



The expression for HD3 can be simplified as 9 n V on «l. This expression is the same for both 
implementations of the transconductor cell. The main conclusion from Eq.(68) is that 
increasing Von, i.e., the common-mode voltage Vc or the bias current Ibias, lowers the 
distortion in the V-I transfer. 

6.2 Non-linearities in the negative resistance 

Non-linearities in V-I conversion and non-linearities in the negative resistance will 
introduce non-linear effects in the input resistance of the differential transconductor (Nauta, 
1993). The effect of these non-linearities is a signal dependent integrator quality factor, i.e., a 
phase error; and can cause distortion in filters, especially high-Q filters. As long as the value 
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of the input impedance is high enough and an enhancement of the dc-gain is obtained, the 
effect of non-linearities will be slight. Considering the equations for a single transistor 
reported in Eq.(62) and analysing the structure shown in Fig. 3, the differential input voltage 
can be written as: 



I T[hr~~r~r~ l ((1t,) 

Pn \\ 1 BIAS \ l Bh 

where (2i) is the differential input current, i.e., Ii + = i and Ii = -i, considering the description 
shown in Fig. 3. The same relation has been obtained in both implementations, leading 
again to the same distortion analysis. This expression can be expanded into Taylor series, 
where A=i/iBiAS, ai=l and a3=l/8. 



v m =a 1 ACos(wt) + a 2 A 2 Cos 2 (wt) + a 3 A 3 Cos 3 (wt);HD 2 =^;HD 3 =^^^HD 3 =—\ — 

2 «i 4 «i 32 Ubia 



(70) 



The large signal output characteristics of the current mode integrator present a similar form 
to that of the standard MOS differential pair. On the other hand, the distortion generated in 
positive feedback compensation through negative resistance is only dependent on the ratio 
between the input current and the bias current Ibias- This analysis is the same for both 
implementations of the transconductor topology. 

6.3 Matching and harmonic-distortion 

The process that causes time-independent random variations in physical parameters of 
identically designed devices is called mismatch, and is a limiting factor in general-purpose 
analog signal processing. The impact of MOS transistors mismatching becomes very 
important because the dimensions of the devices are reduced and the available signal 
swings decrease. To obtain a better circuit design, the physical origin of this effect has been 
discussed in several studies (Pelgrom et al., 1989), not only for its random but also for its 
systematic contribution (Gregor, 1992), and also possible measurements for its 
characterization (Felt et al., 1994). 

Thanks to the use of a fully-balanced pseudo-differential topology, with this inherent 
positive feedback compensation providing the system with an enhancement of the 
differential dc-gain, distortion resulting from mismatch is small. As mentioned above, the 
existing negative resistance enables small variations in dimensions of Mn transistors and 
bias current sources Ibias to be controlled. In order to obtain a good matching, the minimum 
channel length related to the considered CMOS technology must be avoided. However, high 
frequency operation requires short channels, relying on the negative resistance to obtain the 
adequate matching between transistors in the signal path. In addition, channel length 
modulation is not considered, as it only has a substantial effect on integrator response 
linearity at low frequencies, where distortion is suppressed by feedback (Smith et al., 1996; 
Zele et al., 1996). In consequence, transistors are assumed to be well matched, which will be 
achieved by means of a lower mismatching sensitive design while obtaining the final layout. 
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Mismatching between transistors also degrades the outstanding benefits provided by 
balanced structures, since they depend strongly on the symmetry of the circuit. Therefore, 
mismatching can also be reflected in unbalanced signal paths. Post-layout simulations have 
been accomplished regarding sensitivity to this unbalance for both HS and FC 
transconductor implementations. The starting point was an input signal of 10 MHz and 
45 dB of THD. Subsequently, variations of 10% and 20% in magnitude and phase between 
the input currents Ii + and If were considered while analysing the effect on the THD. 
Regarding the variations in magnitude: with a shift of 10% between the magnitude of Ii + and 
If, the THD changes by 1 dB (HS) and 0.5 dB (FC), and for a shift of 20%, the THD changes 
by 2 dB and 1.5 dB respectively. For deviations between the phase of If and the phase of If, 
with a shift of 10%, the THD changes by 1 dB for both topologies; and for a shift of 20%, the 
THD changes by 8 dB (HS) and 6 dB (FC). 

As a result, we can conclude that the proposed topology is quite insensitive to transistor 
mismatching. In addition to this, the effect of common-mode signal mismatching is 
alleviated by means of feedback compensation, as previously explained, thus supporting the 
proposed design strategy. 

7. Digital programmability 

Apart from the usual requirements associated with high frequency CMOS filter design, the 
issue of programmability brings to the forefront the considerable problem of maintaining 
performances such as frequency response accuracy, noise and dynamic range across the 
entire tuning range. Requirements of robust and precise implementation of filtering systems 
in the VHF range point to programmable G m -C continuous-time filters as the best option for 
obtaining a wide programming range (usually 1:5). Due to process and temperature 
variations, G m /C time-constants are liable to vary by as much as ±30%. The fact of 
considering both effects at the same time means that the unity-gain frequency cot of each 
integrator in the filter should be electronically variable over a wide range. 
Lower supply voltages required by current digital CMOS technologies make the use of 
conventional continuous tuning techniques over a wide frequency range very difficult due 
to their effect on dynamic range and non-linear distortion. These techniques are based on 
the variation of the transistors biasing points, limiting their application to compensate the 
inherent changes due to temperature and the technological process. Therefore, discrete 
tuning is the best option to preserve the dynamic range (DR). 

There are three different ways of achieving this wide range of variability: the capacitor, the 
transconductor or both can be made programmable. At high frequencies, the integrating 
capacitances are relatively small. If they are replaced by capacitor arrays to obtain C- 
programmability, the net parasitic capacitances at the terminals of the array can be quite 
large when the array is implementing the lowest effective capacitance, which is a very 
difficult problem to solve. In addition to this, switchable array of capacitors provides high 
precision on filters though the existence of switches in the signal path. So, the constant-C 
scaling technique is the option considered, leading to the desired programmability by 
varying G m discretely while maintaining the noise specifications over the entire frequency 
range. Furthermore, lower power consumption is achieved at low frequency values of the 
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programming range. This is the best option for maintaining a trade-off between noise 
specifications, power consumption and programming range (Pavan et al., 2000). 
Two different strategies can be used to extend the tuning range and preserve DR: switchable 
array of degenerating MOS resistors and parallel connection of identical transconductors 
switched by a digital word. The first one uses the same transconductor and capacitor 
throughout the whole frequency range whilst the degeneration resistor R is formed by a 
parallel connection of MOS triode transistors (Bollati et al., 2001). This technique involves 
variations over the entire frequency range of the noise factor, which is proportional to the 
degenerated resistor, generating dynamic range variations. Phase errors will also appear, 
achieving the worst situation for both undesirable parameters in the opposite ends of the 
frequency range: minimum DR (maximum R) for the lowest frequency f m m, and maximum 
phase error for the highest frequency f max . However, this strategy leads to the simplest 
structure, with a small active area and lower power consumption. The second strategy 
consists of a parallel connection of identical transconductors obtaining a programmable 
array where the desired time-constant can be digitally tuned (Pavan et al., 2000). This 
solution is the best option for VHF applications. However, its main drawbacks are power 
consumption and area, proportional to the number of connected active cells. 
Considering all these ideas, the latter strategy is the technique selected for achieving the 
desired programmability for the proposed topology. Programmability using a parallel 
connection of conventional differential pairs has been published previously (Pavan et al., 
2000); however, these structures are not directly suitable for low-voltage supply. It is worth 
noting that obtaining a programming range for a transconductor also includes an additional 
gap of +30% for the extreme transconductance values G m i n and G max/ in order to compensate 
the deviation due to temperature and technological process variations. Therefore, the total 
tunable range will be greater than the nominal one. 

7.1 Principle of programmability 

Our proposal is to achieve a digitally programmable transconductor, specifically designed 
for a wide programmability range comprised of parallel connection of unit cells. Fig. 11 
shows the conceptual scheme of a 3-bit programmable cell. This topology presents two main 
drawbacks; the need for additional transistors in the signal path and the variation of 
parasitic capacitances C p i n and C pou t depending on the digital word. However, it is necessary 
to keep the dynamic range constant for each g m value and the total node parasitic 
capacitances over the entire programming range. 

This solution is adopted for the proposed transconductor topology, giving a HS 
implementation with a programmable range from 1:7 (Fig. 12(a)) and a FC implementation 
with a varying range from 1:5 (Fig. 12(b)). Each cascode unit cell (Fig. 6), i.e., cascode 
amplifier and biasing current source, consists of a parallel connection of equal cells switched 
by a digital word. The connecting lines of the substrate terminals are not shown on these 
schematics as the explanation has already been given in previous sections. By driving the 
gates of the cascode transistors (Mp-jc and Mpc in the HS approach, and Mpp and Mns in the 
FC) with modulated digital voltages we can obtain the desired transconductance with no 
switches in the signal path and power consumption proportional to total transconductance. 
The other disadvantage inherent to this topology can be also alleviated with an additional 
design strategy. When a change in the digital word occurs, some transistors change from 
saturation to cut-off region and vice versa, and different contributions to total input node 
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Fig. 11. Topology of the 3-bit programmable transconductor. 

parasitic capacitance C p j n are obtained. This change can generate a shift in the desired 
frequency and Q-factor variations, limiting the integrator and filter performance. In 
consequence, the implementation of each unit cell has been modified by using dummy 
elements connected at the input, which allow us to make the input capacitance independent 
of the digital word, maintaining the same parasitic capacitances on the signal processing 
nodes (Pavan et al., 2000). Note that the total output parasitic capacitance -junction extrinsic 
capacitance- is also constant because it has almost the same value for cut-off or saturation 
transistors (Tsividis, 1996; Tsividis, 1999). 





(a) 



(b) 



Fig. 12. Implementation of 3-bit programmable topology: (a) HS and (b) FC transconductor. 

As the output conductance is proportional to the transconductance, the differential dc-gain 
is maintained irrespective of the digital word. Consequently, the relative shape of the 
frequency response, output noise power and dynamic range are independent of the digital 
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word. Therefore, we obtain the desired programmable transconductor with no switches in 
the signal path by driving the gates of bias and cascode transistors with a digital word 
modulated with the adequate analog value. The power consumption is proportional to the 
necessary transconductance in each frequency range. The digital control word for 
programming the transconductor is b2bibo, controlling transconductance value from 001=g m 
to lll=7g m in the HS topology and from 001=g m to lll=5gm in the FC topology. 



7.2 Implementation results 

Two different integrators have been implemented. The first one is based on the HS topology, 
considering the total input node parasitic capacitance -basically gate-source capacitances- 
as the total integration capacitance with no need for any external capacitors, in order to 
reach the maximum operation frequency with moderate power consumption. Therefore, for 
this situation, maintaining the integration capacitance constant becomes essential and its 
value can be controlled by means of the dummy-based system. 

Fig. 13(a) plots the post-layout simulation results for HS implementation and shows the 
variation of unity-gain frequency versus digital word value. The expected linear 
dependence of the transconductance and the constant integration capacitance are observed, 
and a programming range from 28 to 185 MHz is obtained by varying the digital word. 
However, a marked phase lag due to parasitic effects (parasitic zero so) at high frequency 
was detected, as expected. A possible compensation scheme, is based on two capacitors, Cc, 
implemented with dummy MOS transistors and connected in a cross-coupled manner as 
shown in Fig. 7(a). Consequently, by using this compensating scheme, the phase shift error 
is effectively reduced and a very efficient scheme endowed with a phase error of less than 3° 
over the entire frequency range is obtained. 
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Fig. 13. Unity-gain frequency and phase vs. digital word value for the: (a) HS integrator 
with and without compensation scheme; (b) FC integrator with various compensation 
schemes. 

The second implementation is based on the FC topology, considering integration 
capacitance in this case as the total input node parasitic capacitance together with an 
additional one (C ext =1.2 pF). Total integration capacitance is once more maintained constant 
by means of the dummy-based system and Fig. 13(b) plots the post-layout simulation 
results. The first curve plots the variation of the transconductance as a function of the digital 
word when no external capacitance is connected at the input; a non-linear response is 
obtained, due to the expected parasitic poles and zeros. When connecting the external 
capacitance, the expected linear dependence is obtained, providing the system with coarse 
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tuning. Nevertheless, a phase shift is obtained even when the compensation scheme based 
on the two cross-coupled capacitors Cc is used. 

Next step involves a resistance R being connected at the input in series with the external 
capacitance, as shown in Fig. 7(b). This resistor is implemented with a transistor working in 
the linear region and is the best option to compensate this phase error. Therefore, by varying 
the digital word, the unity-gain frequency is controlled and the phase error is effectively 
reduced over the entire programming range. Then, to control the operation frequency and to 
reduce the phase error, a shunt connection is made at the input between a resistance and the 
integration capacitance Q. We obtain a compensation scheme for the FC transconductor 
based on an RC circuit at the input, leading to a programming range from 40 to 200 MHz by 
varying the digital word with a phase error of less than 3° over the entire frequency range. 
We can define the transconductor input voltage variations around the bias point (Vc) as 
shown in Fig. 3. The linear input range is constant for digital scaling of the transconductance 
as shown in Fig. 14. The variation of the g m as a function of the digital word is presented, 
providing the system with coarse tuning. In consequence; for the HS topology, oot is 
controlled from 28 to 185 MHz by varying the digital word from 1 to 7; and for the FC 
topology, oo t varies from 40 to 200 MHz by varying the DW from 1 to 5. Therefore, by means 
of a parallel connection of equal transconductors switched by a digital word we guarantee 
that the DR for each g m value and the total external node capacitances will be kept constant. 
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Fig. 14. Transconductance as a function of the digital word (coarse tuning) for the: (a) HS 
implementation; (b) FC implementation. 

On the other hand, fine tuning can be achieved if necessary, as the transconductance value 
can be controlled by varying the bias current source for a fixed digital word. Hence, discrete 
steps are swept by varying the bias current while maintaining the same dynamic range. At 
the same time, an additional control over the dc-gain can be achieved by modifying the ratio 
between the bias currents of the negative resistance: M1/M2 and M4/M5 in both topologies, 
solving problems associated to mismatching between transistors. Therefore, a complete 
control of the frequency response can be obtained. The trade-off between transconductance 
and linear input range is shown in Fig. 15 for both topologies. These figures can also be seen 
as the fine tuning for the proposed structure since the transconducance value is controlled 
by varying the bias current source for a fixed digital word: Ibias changes from 45 to 180 uA 
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control the HS transconductance from 270 to 452 |iA/V, and changes from 40 to 100 uA in 
the FC topology control the transconductance from 550 to 800 uA/ V. 





(a) 



(b) 



Fig. 15. Transconductance versus biasisng currents (fine tuning) for the: (a) HS 
implementation; (b) FC implementation. 

To conclude, the proposed structure is a balanced topology aimed at improving immunity 
to digital noise and linearity. A digitally programmable transconductor has been designed, 
maintaining the same dynamic range over the entire frequency range. Therefore, it can be 
used in the design of programmable filters, as the expected characteristics of a 
programmable cell will be obtained: to maintain Q-f actor, noise power and maximum signal 
swing constant over the entire programming range, leading to a DR independent on the 
operation frequency. The expected linear dependence of the unity-gain frequency is 
obtained and the phase error is effectively reduced over the entire programming range in 
both implementations, with a compensation scheme based on two cross-coupled capacitors 
for the HS topology and the classical RC circuit connected at the input for the FC approach. 

8. Results and discussion 

To demonstrate the theoretical advantages of this approach for a programmable 
transconductor suitable for VHF, two 3-bit programmable integrators have been designed. 
The HS transconductor has been implemented by using the design kit of an AMI 
Semiconductor (AMIS) 0.35 urn CMOS technology (P-substrate, N-well, 5-metal, 2-poly) 
with a 3 V power supply and a nominal bias current of 90 uA per branch; whereas the FC 
transconductor has been implemented by using the design kit of an AMS (C35B4C3) 0.35 urn 
CMOS technology (P-substrate, N-well, 4-metal, 2-poly) with a 2 V power supply and a 
nominal bias current of 100 uA per branch. 

The dimensions of the transistors were chosen in order to cover all the design requirements 
obtained in this chapter, leading to a complete sweep of the discrete step by varying the bias 
current. In this way, for the HS implementation, the operation point is located at 90 |iA and 
the bias current adjustment is possible from 45-180 |iA. However, for the FC 
implementation, the operating point is located at 100 |iA, covering the digital step by 
varying the bias current from 20-110 uA. In this way, the discrete tunability requirement is 
obtained but the FC transconductance value at the operation point is maximised. 



8.1 Layout strategy 

A careful layout has been drawn out to obtain all the characteristics associated with the 
proposed design accurately and demonstrate the feasibility of the intended approach. As 
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stated below, we have taken special care to get rid of the unwanted effects related to 
parasitic elements and mismatching (Baker et al., 1998; Hastings, 2001). All the designs have 
been carried out taking into account the specific design rules for high frequency operation, 
which are highly appropriate for obtaining good matching between components. 
Interdigitized and common-centroid layout techniques have been considered to reduce the 
variations of threshold voltage, which are associated with gradients in gate-oxide thickness. 
Guard rings have been included in the design with the aim of reducing substrate noise. 
Bond-pads have also been carefully laid out and, in this way, input and output pins have 
been placed as far as possible between them. Balanced structures provide outstanding 
benefits, but they are strongly dependent on the symmetry of the circuit. Consequently, 
special care has been taken to outline the paths of the balanced signals, in an attempt to 
ensure the best matching between them. MOS devices have fragile gates seeing that 
electrostatic discharges may cause destruction of the device if the oxide breakdown voltage 
is exceeded. Considering this point, we concluded that it would be advisable to provide the 
transistors that control the quality factor of the circuit with a path protection system. The 
scheme chosen to achieve this goal was the anti-parallel diodes configuration. This circuit is 
very straightforward and simple but is sufficient for the purposes of this work. 
Fig. 16(a) shows the drawn layout of the HS test chip with an active area of 0.10 mm 2 . Fig. 
16(b) shows the microphotograph of the programmable FC transconductor, with an active 
area of 0.04 mm 2 including the compensation RC circuit, where the integration capacitance 
has been implemented with a double-poly capacitor. The area of the FC active element is 
0.03 mm 2 and a regular and compact arrangement of transistors can be observed. 




(a) (b) 

Fig. 16. (a) Layout of the fully-balanced 3-bit programmable HS integrator, 
(b) Microphotograph of the FC integrator, by using double-poly capacitors. 



8.2 Experimental results 

For the HS approach, a unity-gain frequency of 28 MHz was achieved with a power 
dissipation of 1.62 mW using a 3 V supply. By varying the digital word from 1 to 7, we 
expected to control the unity-gain frequency from 28 to 185 MHz and the experimental 
results lead to a variation between 25 and 185 MHz, as shown in Fig. 17(a). Focusing on the 
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same figure, by varying the bias current source from 45 to 180 uA for a fixed digital word, 
the transconductance value is modified, providing complementary fine tuning of the 
frequency. All discrete steps are covered and, in consequence, a frequency span of 25-185 
MHz can be provided. The maximum frequency error is obtained at the maximum digital 
word where a deviation of 6 % is obtained from the 7:1 ratio. 
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Fig. 17. Experimental results for coarse and fine tuning of the (a) HS and (b) FC topology. 
Variation of the unity-gain frequency versus bias currents for all the digital words. 

For the FC approach, a unity-gain frequency of 40 MHz is achieved with a power 
dissipation of 2.4 mW using a 2 V supply, as expected from the post-layout simulation 
results. By varying the digital word from 1 to 5, the unity-gain frequency is controlled from 
40 to 190 MHz, as shown in Fig. 17(b). All discrete steps are swept by varying the bias 
current from 20 to 110 uA. The maximum frequency error is obtained at the maximum 
digital word where a deviation of 5 % is obtained from the 5:1 ratio. 

The next step is to demonstrate constant linearity by means of a constant THD over the 
entire programming range. Figs. 18 and 19 show the THD variation as a function of the 
differential output current for all the digital words. THD was measured for a sine input 
current of 10 MHz (a) and for the unity-gain frequency (b) in both topologies. These figures 
show the expected THD dependence, studied above in section §6: lower bias currents or 
higher input signal amplitudes lead to higher THD values. A corner parameter analysis was 
carried out following the guidelines provided by the design kit manufacturer of the 'AMI 
Semiconductor C035M Design-Kit' and the worst-case analysis for the HS integrator was 
obtained. This distortion study gave 1 % of THD for a differential input signal of 56 uA and 
10 MHz. Experimental results for the design, shown in Fig. 18, lead to a differential input 
current of 50 uA in the same situation. For the FC approach, the expected value for 1 % of 
THD was a differential input signal amplitude of 37 uA and 10 MHz; and the experimental 
results (Fig. 19), give an amplitude of 35 uA. 

The post-layout simulated result for the input-referred noise integrated from to 30 MHz in 
the HS topology was 11.2 nA rms . Hence, the dynamic range, defined as the input signal 
amplitude at 1 % THD divided by the total noise level integrated over 30 MHz, is 70 dB. In 
the FC structure, the input-referred noise integrated from to 40 MHz was 8 nA rms . Hence, 
the dynamic range, defined as the input signal amplitude at 1 % THD divided by the total 
noise level integrated over 40 MHz, is also 70 dB. 

In summary, frequency is adjusted in a coarse discrete way by connecting identical 
transconductors in parallel and with fine continuous tuning by varying the biasing current. 
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THD versus differential output current in the HS integrator for three different digital 
(a) oo(input)=10 MHz, (b) co(input)= co t (25 MHz for lg m ). 
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Fig. 19. THD versus differential output current in the FC integrator for all the digital words: 
(a) co(input)=10 MHz, (b) oo(input)= a t (40 MHz for lg m ). 

The feasibility of the programmable array of transconductors has been proven in a 3-bit 
programmable integrator obtaining frequency scaling as expected. All the specifications in 
both transconductor implementations are summarized in table 7. The main advantage of the 
topology proposed was the inherent enhancement of the dc-gain, provided through the 
existing positive feedback compensation (negative resistance). 

The HS design condition was very difficult to achieve because technological process and 
temperature variations are expected to be greater than the small changes required in this 
topology. As expected, by varying the external control for this negative resistance, no 
change was obtained for the dc-gain. The post-layout simulated dc-gain was a variation of 
15 dB between the minimum (40 dB) and the maximum (55 dB), with a maximum CMRR of 
60 dB. The experimental results lead to a differential dc-gain of 30 dB with no change with 
the value of the negative resistance and a CMRR greater than 35 dB over the entire 
frequency range. Therefore, in this case, there is no control on the dc-gain of the system. 
The design condition for the FC topology is less restrictive and two different 
implementations have been fabricated. The post-layout simulation results in both cases 
showed a dc-gain control of 15 dB from 30 to 45 dB and a maximum CMRR of 50 dB. The 
first implementation has been designed with the same dimensions for the Mn transistors 
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involved in the negative resistance, and similar results are obtained as in the HS topology. 
There is no external dc-gain control and an experimental value of 26 dB and CMRR of 33 dB 
are obtained. In the second one, where a pre-designed mismatching is included between Mn 
transistors involved in the negative resistance, a variation of 12 dB (from 26 to 38 dB) for the 
dc-gain is obtained by modifying the value of the negative resistance (Fig. 20). The CMRR is 
greater than 46 dB over the entire frequency range. 





HS topology 


FC topology 


Power supply voltage 


3V 


2V 


Unity-gain frequency 


25 MHz 


40 MHz 


Power dissipation 


1.62 mW 


2.4 mW 


CMRR over the entire pass-band 


>35dB 


>46dB 


Active area 


0.10 mm 2 


0.04 mm 2 


Total rms input-referred noise (sim.) 


11.2nArms 


8 nA rms 


Maximum differential input signal 
current at 1 % THD @ 10 MHz 


50 uA (peak) 


35 uA (peak) 


Dynamic range 


70 dB 


70 dB 



Table 7. Summary of the experimental results for the integrator (1 LSB). 




frequency ( MHz) 



Fig. 20. Experimental dc-gain control for the FC transconductor with a pre-designed 
mismatching between Mn transistors involved in the negative resistance. 



9. Conclusion 

This work describes a new approach for implementing digitally programmable and 
continuously tunable VHF/UHF transconductors compatible with pure digital CMOS 
technologies and suitable for HDD read channel applications. The cell is suitable for low- 
voltage operation over an extended frequency range. The programmability exhibited by the 
transconductor is due to the use of a generic programmable structure that gives a G m digital 
control as a parallel connection of unit cells, and the total parasitic capacitances are 
maintained constant thanks to the specific design of the unit cell: a cascode stage with 
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dummy elements. This transconductor could be used in any kind of G m -C filter, thus 
providing a very wide range of programmable CT filters. The fully-balanced current-mode 
G m -C integrator based on this topology exhibits a unity-gain frequency programmability 
from 25-185 MHz in the HS implementation and 40-200 MHz in the FC approach; with a 
phase error of less than 4° in both topologies throughout the entire operating frequency 
range. Total harmonic distortion (THD) of less than 1 % (-40 dB) for a differential input 
signal of 50 and 35 u,A in the HS and FC topology respectively is obtained. The integrator 
operates over the programming range with 70 dB of dynamic range for 1 % of THD. The cell 
has been fabricated in a 0.35 u.m CMOS process. 

The experimental results confirm this approach as an excellent choice to achieve filters 
exhibiting a good trade-off between tuning capability and dynamic range working in the 
very high frequency range. The proposed technique can be easily adapted to lower power 
supply voltages by using folded cascode structures and, in addition, better frequency ranges 
of operation can be achieved considering current CMOS digital technologies. 
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1. Introduction 

Nowadays, non-volatile storage technologies play a fundamental role in the semiconductor 
memory market due to the widespread use of portable devices such as digital cameras, MP3 
players, smartphones, and personal computers, which require ever increasing memory 
capacity to improve their performance. Although, at present, Flash memory is by far the 
dominant semiconductor non-volatile storage technology, the aggressive scaling aiming at 
reducing the cost per bit has recently brought the floating-gate storage concept to its 
technological limit. In fact, data retention and reliability of floating-gate based memories are 
related to the thickness of the gate oxide, which becomes thinner and thinner with 
increasing downscaling. The above limit has pushed the semiconductor industry to invest 
on alternatives to Flash memory technology, such as magnetic memories, ferroelectric 
memories, and phase change memories (PCMs) (Geppert, 2003). The last technology is one 
of the most interesting candidates due to high read/write speed, bit-level alterability, high 
data retention, high endurance, good compatibility with CMOS fabrication process, and 
potential of better scalability. However, it still requires strong efforts to be optimized in 
order to compete with Flash technology from the cost and the performance points of view. 
In PCMs, information is stored by exploiting two different solid-state phases (namely, the 
amorphous and the crystalline phase) of a chalcogenide alloy, which have different electrical 
resistivity (more specifically, the resistivity is higher for the amorphous, or RESET, phase 
and lower for the crystalline, or SET, phase). Phase transition is a reversible phenomenon, 
which is achieved by stimulating the cell by means of adequate thermal pulses induced by 
applying electrical pulses. Reading the resistance of any programmed cell is achieved by 
sensing the current flowing through the chalcogenide alloy under predetermined bias 
voltage conditions. The read window, that is, the range from the minimum (RESET) to the 
maximum (SET) read current, is considerably wide, which allows safe storage of an 
information bit in the cell and also opens the way to the multi-level approach to achieve 
low-cost high-density storage. ML storage consists in programming the memory cell to one 
in a plurality of intermediate resistance (i.e., of read current) levels inside the available 
window, which allows storing more than one bit per cell (the number of bits that can be 
stored in a single cell is n = logjin, where m is the number of programmable levels). The 
programming power and the read window depend on the electrical properties of the cell 
materials as well as on the architecture and the size of the memory cell. As the fabrication 
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technology scales down the cell dimensions, new challenges arise to accurately program the 
cell to intermediate states and discriminate adjacent resistance levels. 

In this work, we investigate the impact of technology scaling down on both the program 
and the read operation by means of a simple analytical model which takes the electro- 
thermal behavior of the PCM cell and the phase change phenomena inside the chalcogenide 
alloy into account. 



2. Working principle of the PCM cell 

The working principle of a PCM cell relies on the physical properties of chalcogenide 
materials, typically Ge2Sb2Tes (GST), that can switch from the amorphous to the crystalline 
phase and vice versa when stimulated by suitable electrical pulses. Basically, a PCM cell is 
composed of a thin GST film, a resistive element named heater (TiN), and two metal 
electrodes, i.e., the top electrode contact (TEC) and the bottom electrode contact (BEC). Only 
a portion of the GST layer, which is located close to the GST-heater interface and is referred 
to as active GST, undergoes phase transition when the PCM cell is thermally stimulated. In 
particular, in this work we focus our attention on the Lance heater geometry (Pellizzer et al., 
2006), which is essentially composed of a thin layer of GST alloy and a pillar-shaped heater, 
as shown in Fig. 1. In the reference Lance heater cell implemented in the 90 run technology 
node, the GST thickness t is 70 nm, the GST-heater contact area A is 3000 nm 2 , and the heater 
height h is 180 nm. 

The typical V-I characteristic of the PCM cell in the amorphous (RESET) and the crystalline 
(SET) state is shown in Fig. 2. Consider the case of a cell in its full-SET state: the differential 
resistance of the cell decreases as the applied voltage increases. This effect is due to the 
contribution of the crystalline GST to the cell resistance. In fact, the crystalline GST 
resistivity decreases with increasing electrical field inside the material. 




Fig. 1. Conceptual scheme of a PCM Lance heater cell. 
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Vth v 

Fig. 2. V-I curve of a PCM device in the SET and the RESET state. 

The V-I curve of the cell in its RESET state shows an S-shaped behavior. This effect is due to 
the threshold switching phenomenon (Adler et al., 1980; Ovshinsky, 1968; Pirovano et al., 
2004; Thomas et al., 1976) which consists in a sudden drop of the amorphous GST resistivity 
as the voltage across the PCM cell exceeds a critical value, typically referred to as threshold 
voltage, Vti,. Thus, when low-amplitude voltage pulses are applied to the cell, a low current 
flows through the device, which is in its high-resistance state (OFF region in Fig. 2). On the 
other hand, when a high-amplitude voltage pulse is applied to the cell, threshold switching 
takes place and the device shows a much lower resistance (ON region in Fig. 2). It can be 
noted that the V-I curves of the cell in the two states (SET and RESET) are almost 
superimposed in the ON region, while they are substantially different in the OFF region. 
Thus, readout must be carried out by operating the cell in the OFF region. Typically, a 
predetermined read voltage is applied to the cell and the current flowing through the 
device, referred to as read current, is sensed (current sensing approach). The read voltage 
must be low enough to avoid unintentional modification of the cell contents due to the read 
pulse. On the other hand, writing is carried out by operating the cell in the ON region, in 
order to provide the device with enough energy to induce phase change. Since phase 
transitions are thermally assisted, in PCM devices Joule heating is exploited to raise the 
temperature inside the chalcogenide material to the required value. The crystalline-to- 
amorphous phase transition is obtained by applying a high-amplitude electrical pulse to the 
cell so as to bring the temperature of the active GST material above the melting point T„, 
(about 600 °C) (Peng et al., 1997), and then quickly cooling the memory cell, in order to 
freeze the GST material into a disordered (i.e., amorphous) structure. A pulse duration on 
the order of few tenths of ns is sufficient (Weidenhof et al., 2000). The amorphous-to- 
crystalline phase transition is obtained by applying an electrical pulse with a lower 
amplitude and a longer time duration. In this case, the amorphous material is heated to a 
temperature below the melting point but above the crystallization temperature, that is the 
temperature necessary to activate the crystallization process in the required time scale 
(typically an the order of 100 ns). This way, the thermal energy is able to restore the 
crystalline lattice, which is a minimum-energy configuration. Typical electrical pulses for 
SET and RESET operations are shown in Fig. 3. 
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Fig. 3. Standard pulses for bi-level PCM programming. 
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Fig. 4. Architecture of a PCM matrix (a) and schematic of the circuit used to program and 
read the memory cell (b). Transistors MselIs the row select transistor. 

A PCM memory chip is made of a large number of PCM cells organized in a bi-dimensional 
array. As opposed to the case of Flash memories, in which the elementary storage consists of 
a floating-gate transistor, the PCM memory cell is a programmable resistor and, hence, is a 
two-terminal device. For this reason, a NOR type architecture is adopted (Fig. 4a). As shown 
in Fig. 4b, each memory cell consists of a PCM storage element connected to a selection 
transistor Msel which can be either an MOS or a bipolar device. The gate or the base of all 
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select transistors of the same row are connected to the same word-line, while the TECs of the 
PCM cells belonging to the same column are connected to the same bit-line. The memory 
cell is selected by means of row and column decoders that generate the electrical control 
signals required for read and write operations. 

3. Programming operation 

We analyzed first the impact of technology scaling on the programming operation, focusing 
our attention on the electical power (hereinafter referred to as programming power). The 
maximum programming power is obviously required by the RESET operation, where the 
highest temperatures are needed to melt the active GST volume. The RESET pulse duration 
must be higher than the minimum required time for melting \cite{WeidenhofOO}, while the 
cooling time must be short enough to prevent the crystallization process from taking place. 
The minimum current required to melt a portion of the active GST layer is referred to as 
melting current, I m . When the current flowing through the memory cell during a write 
operation is higher than I m , the obtained RESET resistance increases with the amplitude of 
the current pulse. In fact, the maximum temperature inside the cell increases with the pulse 
amplitude, thus leading to the amorphization of a larger GST volume. 

The maximum temperature reached inside a Lance heater cell of given sizes can be 
estimated by means of an approximated electro-thermal model. In general, the temperature 
increase in the active GST volume is due to the current flow both through the heater (heater 
heating) and through the GST layer itself (GST self- heating). Nevertheless, GST self-heating 
can be neglected when considering high-amplitude RESET pulses. In fact, the resistance of 
the GST layer (both in the crystalline and in the amorphous state) is negligible with respect 
to the heater resistance due to high-field effects (the PCM cell is operated in the ON region). 
Thus, in this case we can estimate the temperature profile inside the PCM cell by 
considering only the Joule power generated inside the heater when a current I flows 
through the cell. We assume, for simplicity, a cylindrical geometry of the heater and 
calculate the temperature along the cell axis. The power generated in a volume ASz located 
at a distance z from the heater-BEC contact is equal to SQ = '' Sz , ph being the heater 
electrical resistivity, and contributes to the temperature increase AT at the heater-GST 
interface with a term ST given by 

ST = [(R th ,GST + R»(2)) II R d (z)] Rth ' GST SQ, (1) 

K th,GST + K u{ z ) 
where R u (z) = -^ = j- and Rj(z) = — ^t-(kj, being the thermal conductivity of the heater 

K h A K h A 

material) are the heater thermal resistance from the coordinate z to the heater-GST contact 
and to the heater-BEC contact, respectively, and Rth.Gsris the equivalent thermal resistance of 
the GST layer. 

By integrating Eq. (1) along the cell axis from the BEC-heater contact ( z = 0) to the heater- 
GST contact ( z = h), we obtain the temperature T at the interface: 
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T = I p h h Rth,GST R th,h i j ,~ 

A 2( R th,GST + R th,h) 

= Qjl(Rth,GST\\Rth,h) + T (3) 

In the above equations, To is room temperature, Qj = — j— is the Joule power delivered to 
the cell during the RESET pulse, and Rth,h the thermal resistance of the heater, which can be 
expressed as — ^— . 
From Eq. (2), taking the expression of Rth.h into account, I m is given by 



(T m -T )(R tKGS T+Rth,h) 



(4) 
Phh Rth,GST R th,h 

In order to estimate the dependence of R«,,Gsron the geometrical features of the memory cell, 
we simulated the temperature profile along the cell axis inside the GST layer (Fig. 5a). Fig. 
5b shows the simulation results for different values of the GST layer thickness obtained with 
our previously proposed 3D model (Braga et al., 2008). It can be noticed that the 
temperature decreases almost linearly inside the GST layer with increasing distance from 
the GST-heater contact. Moreover, the accuracy of the linear approximation increases as the 
ratio between the GST layer thickness and the heater radius decreases. Since this behavior 
suggests that heat flow inside the GST is substantially directed along the cell axis, from the 
heater-GST interface along the cell axis, a reasonable approximation for the thermal 



resistance of the GST layer is Rm.GST = — —r , where kgst is the thermal conductivity of the 
GST. Thus, we can rewrite Eq. (4) as 



^GST^ 



As highlighted by Eq. (5), the melting current depends on the ratios -j- and -j- . 
Due to fabrication process constraints, heater geometries with a high aspect ratio (i.e., 
geometries having a high ratio between the GST-heater contact diameter and the heater 
height), may not be easily manufacturable. Several fabrication solutions have been proposed 
to overcome lithographic limits and, thus, realize heater structures with minimized contact 
area (Lam, 2006; Pirovano et al., 2008). In the following, we will consider heater geometries 
with a high aspect ratio with the purpose of investigating the scaling perspective, even if 
they may require advanced fabrication techniques. Given a scaling factor e < 1, I„, turns out 
to be proportional to e in the case of isotropic scaling, where all the linear dimensions are 
scaled by the same amount, while I m cc e 2 in the case of shrinking, where only planar 
dimensions are scaled. The comparison of melting current reduction in the cases of isotropic 
scaling and shrinking is shown in Fig. 6. 

In order to compare PCM cells having different dimensions, we chose to consider the full- 
RESET state to be achieved when the maximum temperature inside the PCM cell reaches a 
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Fig. 5. Cell structure (a) and simulated temperature Maps inside a Lance heater PCM cell 
with different values of GST layer thickness: 40 nm, 70 nm, and 100 nm (b). Notice that the 
temperature profile is almost linear inside the GST layer. The maps were obtained by means 
of our 3D electro-thermal model (Braga et al., 2008). 
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Fig. 6. Melting current reduction in the case of isotropic scaling (left) and shrinking (right). 
The dimensions are scaled with respect to a reference lance heater cell realized in 90 nm 
technology 



186 



Advances in Solid State Circuits Technologies 




500 



1000 



1500 
Contact area (nm 



2000 

2, 



2500 



3000 



Fig. 7. Map of the RESET current as a function of the GST-heater contact area and the heater 
height (the GST layer thickness was set to 70 nm). 
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Fig. 8. RESET current dependence on the geometrical parameters of the memory cell. 

predetermined value, Trst, which is obtained with a current pulse of amplitude Irst- 
Typically, Irst is 50% higher than I m . Different cells require different pulse amplitudes (Irst) 
to reach Trst, due to the different values of the electrical and the thermal resistance of the 
device. The dependence of the RESET current on cell sizes obtained by means of Eq. (4) is 
sketched in Fig. 7 and Fig. 8. The reduction of the heater height leads to a significant 
increase of Irst due to the decrease of the Joule power and heater thermal resistance. On the 
contrary, the reduction of the contact area only, that is the shrinking approach, leads to a 
linear decrease of the RESET current, due to the increase of the Joule power and the thermal 
resistance of the cell. The same behavior is obtained when considering the scaling of the GST 
layer thickness. 

The values of the electrical and thermal properties used in the above simulations are 
summarized in Tab. 1. For simplicity, the field dependence of the crystalline GST resistivity 
was neglected. In order to validate the described analytical compact model, we compared 
the temperature profiles along the cell axis obtained with this model and our 3D finite- 
element model (Fig. 9). 
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Fig. 9. Comparison of the thermal profile along the cell axis obtained by means of the 
analytical model and the 3D finite-element model. 
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Table 1. Electrical and Thermal Properties of Cell Materials- 

A good agreement is observed especially inside the GST layer. The slight temperature 
disagreement inside the heater is ascribed to the inhomogeneous heat flow in the material 
that surrounds the heater. To take this thermal evacuation contribution into account, the 
value of Kh used in the compact model was set higher than the actual physical value. 



4. Read operation 

The GST layer undergoes crystalline to amorphous phase transition in the region where the 
temperature exceeds the melting point. As pointed out above, the temperature profile along 
the cell axis inside the GST decreases almost linearly with the distance from the GST-heater 
interface. By approximating the thermal profile inside the GST along the cell axis with a 
straight line, we derived the analytical expression for the thickness of the amorphous cap x„ 
obtained when a full-RESET pulse is applied to the cell: 
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Trst ~ ^o 



(6) 



Thus, the thickness of the amorphous cap obtained by means of the RESET operation is a 

(T -T ) 
fraction / = RST " of the GST layer thickness (Braga et al., 2009). The volume of 

amorphous GST determines the value of the GST resistance in the RESET state and, thus, the 

lower edge of the read window. Since the temperature gradient is much higher along the 

cell axis than along the other two axis, the ratio between the thickness and the width of the 

amorphous cap is quite high, thus allowing us to estimate the amorphous GST resistance in 

the full-RESET state as 

RRST=PAJ + Rh*PAJ> (7) 

where pa is the amorphous GST resistivity and Rh has been neglected since it is much lower 
than the resistance of the GST layer after the full-RESET pulse. 

In order to estimate the cell resistance in the full-SET state, by neglecting the current spread 
inside the crystalline GST, we can write: 

R SET =^ + R h , (8) 

where pais the resistivity of crystalline GST. 

When considering the current sensing approach, we can calculate the minimum and the 

maximum read current: 

J , . = Vread (9) 

L rd,min „ ' \ y / 

K RST 

j read i-\ n\ 

l rd,max „ ' \ ±u f 

K SET 

where V re ad is the amplitude of the read voltage. Vread must be lower enough to avoid 
unintended programming during readout. The read current window is affected by both the 
scaling of Vread and the geometrical scaling strategy. It must be pointed out that when V rea d is 
kept constant (this approach will be referred to as constant voltage approach), the electrical 
field Eread during readout inside the amorphous GST increases as the size of amorphous cap 
scales (E rea d ~ — ^p- ), thus impacting on the electrical resistivity of the amorphous GST. In this 
case, in order to calculate the read current, the exponential dependence of the amorphous 
GST resistance on the electrical field must be taken into account (Ielmini & Zhang, 2007; Kim et 
al., 2007). For a given PCM cell in the RESET state, neglecting the heater resistance, we have 



'-■re 

oce E "> , (11) 



L :■;■■::• 
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where £ re /is the electrical field which activates the electrical resistivity inside the amorphous 
GST. The value of Vrmrfrmist be chosen so as to ensure that the PCM device is operated in the 
read region (OFF zone) and the electrical field during readout is below the critical switching 
field for every considered cell size. In this respect, we chose V re ad = 0.3 V and calculated the 
cell resistance and the read current for both the SET and the RESET state. E re f was set to 30M 
V/m (Buckley & Holmberg, 1974). 

Several studies (Adler et al., 1980; Buckley & Holmberg, 1974) have shown that Vj/, decreases 
linearly with the amorphous GST thickness which, in our case, is a fraction of the GST layer 
thickness. Then, we can scale V^and t consistently, so as to keep the electrical field during 
readout inside the amorphous GST roughly constant and below the critical value for 
threshold switching (Buckley & Holmberg, 1974). This scaling approach will be referred to 
as constant field scaling. 

It can be noticed from the simulation results in Fig. 10, that constant voltage approach leads 
to an increase of the SET read current as the thickness of the GST layer decreases, due to the 
reduction of the SET resistance. Moreover, a significant increase of the minimum current 
(RESET state), mainly due to the dependence of amorphous GST resistivity on the electrical 
field, is apparent. The increase of the RESET read current depends on E re f and is affected by 
the value of V rm( i. Rather different results are obtained when considering constant 
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Fig. 10. Constant voltage approach: read current as a function of the contact area A for 
different values of GST layer thickness t and heater height h. The read voltage is assumed to 
be 0.3 V. 
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Fig. 11. Constant field approach: read current as a function of the contact area A for different 
values of GST layer thickness t and heater height h. The read voltage is assumed to be 
proportional to the thickness of the GST layer (V rm d= 0.3 V @ t=70 nm). 

field scaling. In this case, the current read window scales as shown in Fig. 11. The RESET 
current is almost independent on t and h, since the read voltage and the cell resistance 
roughly scale by the same factor. As opposite to the previous approach, in constant field 
scaling the SET read current decreases with decreasing t due to the fact that Rset is less 
affected than V na d by the reduction of t. The dependence of I rea d on the contact area is 
qualitatively similar to the constant voltage case. In both approaches, I, va d progressively 
decreases with decreasing A. 



5. Conclusions 

In this work, we addressed the impact of technology scaling on the performance of phase 
change memory cells by investigating its effects on both the programming current and the 
width of the read window. To this end we derived a simplified analytical model of the PCM 
cell electro-thermal behavior and validate it by means of a 3D finite-elements model of the 
PCM cell. We considered both constant field and constant voltage scaling approaches. Our 
study highlights the program-read tradeoffs challenges which aggressive scaling arises and 
provides analytical insight in the scaling mechanisms. 
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1. Introduction 



Electrostatic discharge (ESD) failure is one of the most important causes of reliability 
problems, therefore the design and optimization of ESD devices have to be done. To achieve 
very short time to market and reduce the development effort, one tries to make use of the 
benefit of simulation tools. However, due to the complex physical mechanism of ESD events 
and the hard mathematic calculation in the snapback region, simulation of the I-V 
characteristic of ESD protection devices has been proved to be difficult. 

This chapter aims at providing a systematic way to ESD simulation, including the process 
simulation, device simulation and circuit level simulation. Process/ device simulation offers 
an effective way to evaluate the performance of ESD protection structures. However, to 
prevent the injury of ESD, protection circuits are used sometimes. Therefore circuit level 
simulation is needed. 

There are several process/ device simulation tools in the world, the most widely used of 
which include Tsuprem4/ Medici, Athena/ Atlas and Dios/Mdraw/Dessis. Tsuprem4, 
Athena and Dios are process simulators, while Medici, Atlas and Dessis are device 
simulators. Mdraw is an independent mesh optimization tool, and the similar functions are 
integrated in device simulation tools, such as Medici and Atlas. The process and device 
simulation methods introduced in the following will be based on Dios/Mdraw/Dessis, 
except for the mixed-mode simulation, which is based on Tsuprem4/ Medici. And the circuit 
level simulation will be carried out on the Candence platform. 

2. Process simulation 

The starting point of ESD simulation is to construct an electronic pattern of the device which 
can be generated by manual device set-up or process simulation. And obviously, process 
simulation provides more realistic description of the device. The principle of process 
simulation is to minimize the errors that might be brought into the following device 
simulation. Therefore, the physical models used should be carefully chosen. The most 
important process steps are implantation and diffusion which will be discussed in the 
following. 

Taking Dios for example, this section will introduce physical models used for implantation 
and diffusion. The implantation models used in Dios consists of analytic implantation 
models and Monte Carlo implantation model. Monte Carlo implantation model simulates at 
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the atomic level, and it consumes too much time, therefore, in most cases, it is not suitable 
for ESD simulation. Analytic implantation models are analyzed by series of distribution 
functions, including Gauss distribution function, Pearson distribution function, Pearson-IV 
distribution function (P4), Pearson- IV distribution with linear exponential tail function 
(P4S), Pearson- IV distribution with general exponential tail function (P4K), Gauss 
distribution with general exponential tail function (GK), Jointed half-Gauss distribution 
function (JHG), Jointed half-Gauss distribution with general exponential tail function 
(JHGK). The eight distribution functions are called single primary distribution functions. 
The complicated expressions of the functions will not be discussed here, and all of them can 
be found in the DIOS USER'S MANUAL. 

The single primary distribution functions describe the relationship between impurity 
distribution and seven key parameters, which are determined by implantation process step. 
The seven key parameters are RP (Rp), STDV (op), STDVSec (op2), GAMma (v), BETA (P), 
LEXP (lexp), LEXPOW (a). The range of parameters that must be specified for each of the 
single primary distribution functions are shown in Tablel. In Tablel, x means the parameter 
must be a real number, xO means the parameter must be nonnegative, > means the parameter 
must be positive, and means the parameter is not allowed for the particular function. Once 
the implanted element, energy, dose, tilt and rotation of an implantation process step are 
defined by users, the relevant parameter set will be looked up in implant tables. With proper 
parameter set, the impurity distribution will be calculated subsequently. If users have data 
fitted to experiments, the parameter set can be defined in implantation command. 
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Table 1. Range of parameter specification for the distribution functions 

According to the simulation results, the single primary distribution functions can be divided 
into 3 groups. Groupl contains Pearson distribution function; group2 contains P4, P4S, P4K 
distribution functions; group3 contains Gauss, GK, JHG, JHGK distribution functions. Fig.l 
(a) shows the 2D impurity distribution with different implantation models; Fig.l (b) shows 
the impurity distribution along Y direction. From Fig.l (a) and Fig.l (b), we can see that 
functions in the same group have similar simulation results. Actually, the distribution 
functions in group3 are usually used in deep implantations, such as WELL implantation in 
CMOS process; and the distribution functions in groupl and group2 are usually used in 
shallow implantations, such as drain/ source implantation in CMOS process. 
In order to obtain more accurate simulation result, we should take ion channeling into 
consideration. Then the dual primary distribution functions should be used. That is, the profile 
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is divided into two components, the first components representing the profile of ions, which 
don't channel, and the second one representing the channel ions. A dual primary distribution 
function is obtained by specifying two single primary functions for the two components 
mentioned above. It can be defined in the implantation command following the format: 

Implantation (..., Function=(functionl,function2)) 



r-^ ji'fii._i-1ii. .;:i.-- , - ,■■ pgsi ;■"' _flio. ogi.gz P4S_dip.grd.gz - P4S_dip.dg1.gz 




Pa_d10.9rd.rjz ■ PJ_dio,dgi.gz 





JHGK dip.grd.rjz ■ JHG* dip.0g1.gz 



Ggur.5_dio.9rfl.9z Ggu?5_flio.flgi.gz 





PJK_dip.grd.gz ■ P J K_flio.dgi.gz 




JUG dio.qrd.gz ■ JUS dio.dg1.gz 



GK_d1p.9rfl.pz ■ GK_flio.flgi.nz 





Fig. 1. (a) 2D impurity distribution 
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Fig. 1. (b) impurity distribution along Y direction 
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DIOS provides 5 models for the diffusion process step: Conventional, Equilibrium, Loosely 
coupled, Semicoupled, and Pairdiffusion. Conventional model is the simplest model but 
consumes the least time, while Pairdiffusion model is the most accurate model but 
consumes the most time. In ESD simulation, we'd better select Pairdiffusion model, because 
it always provides the best boundary shape, which will benefit in convergence problems in 
the following device simulation. 

After selecting proper physical model, the process simulation can be carried out, and the 
produced electronic pattern of device is then imported into the mesh optimization tool- 
Mdraw. After the mesh optimization, device simulation is ready. 

3. Device simulation 

Device simulation is based on solving a set of mathematic and physical equations. And the 
physical parameters used in these equations are described by different physical models, 
parts of which are from papers and others are fitted by software engineers. The parameter 
sets of the physical models are based on the data from several process technologies, and can 
not cover every process technology. Therefore, to a detailed process technology, some 
parameters of physical models should be modified. To simulate an ESD event correctly, 
accurate physical models and proper parameter sets are the most important, no matter 
which simulation method is chosen. 

To account for high electrical field and high temperature effects during an ESD event, the 
physical models below in ISE TCAD must be included: l)Fermi-Dirac statistics. When the 
carrier density exceed lxlO 19 cm-3, the default Boltzmann statistics becomes not suitable for 
simulation. 2) Accurate effective intrinsic carrier density model with band gap narrowing 
and Fermi correction included. 3) A comprehensive mobility model with doping 
dependence, carrier-carrier scattering, and high field saturation taken into consideration (In 
MOS devices, surface mobility degradation due to acoustic surface phonons and surface 
roughness should be also taken into consideration). 4) Recombination model should contain 
both Shockley-Read-Hall (SRH) model and Auger model, and SRH model should take 
doping dependence, temperature dependence and field-enhanced recombination into 
consideration. 5) Avalanche generation. 6) Thermodynamic model considering the self- 
heating effect. 7) Thermoelectric power model. 

Simulating ESD events, three physical parameters are the most important: mobility of carriers 
(u), lifetime of free-carrier (t), and the generation rate (G) dominated by ionization impact. 
Mobility is described in ISE TCAD with several degradation models, just as illustrated 
above. Taking all of these issues into consideration, the mobility is finally formulated as: 

V = f(M lmv ,F) (1) 

The function is determined by which model is chosen for high field saturation. And plow in 
Eq.(l) is formulated as: 

f&=M++M*+Dp~+DM- 1 (2) 

In Eq.(2), Udop represent the doping-dependent mobility degradation mechanism, u e h is the 
mobility due to carrier-carrier scattering, u ac illustrates the surface contribution due to 
acoustic surface phonons, u sr is the surface contribution attributed to surface roughness, and 
D is given by: 
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D = exp(-x/l cril ) 



(3) 



where x is the distance from the interface and l„n is a fit parameter. u ac and u sr can be 
ignored in non-surface devices. 

We have run simulations using different models, and it is found that Masetti model for 
doping dependence mobility degradation, Conwell-Weisskopf model for carrier-carrier 
scattering, and Canali model for high field saturation provide the best result. In Masetti 
model, Udop is expressed as: 
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(4) 



In Eq.(4), Ni is the total doping concentration, u CO nst is the mobility in low doping level 
condition, and other parameters are fit parameters. In Conwell-Weisskopf model, u e h is 
expressed as: 
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(5) 



In Eq.(5), n, p are the electron and hole densities, To=300 K, and T denotes the lattice 
temperature. In Canali model, high field mobility degradation is expressed as: 



M(F)- 



Atom 



-"to,/ 



*Y 



(6) 



In Eq.(6), jii ou , is the low field mobility, v sa t and /? are temperature dependent parameters, and 
are expressed as: 



V sat ~ V sat,0 



'T' \ v sat,exp 
in 



,P=fi> 



(7) 



In Eq.(7), except of To and T, all of the parameters are fit parameters. 

Lifetimes of free-carriers are governed by recombination models. SRH recombination rate 

and Auger recombination rate are given in Eq.(8) and Eq.(9) separately. 



n V-YnY r ni, tf f 



T p (n + rn n i) + T n(p + r P Pi) 



(8) 



R A =(C n n+C p p)(np-nl ff ) 



(9) 



In Eq.(8), n^jf is the effective intrinsic carrier density, yn and yp are correction parameters for 
Fermi statistics, m and pi are expressed as: 
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(10) 



where Etrap is the difference between defect level and intrinsic level. The silicon default value 
is E trap =0. In Eq.(8), t„ and T p are temperature and field dependent parameters, expressed as: 



dop 



f(T) 



l + Sc(F) 



c = n,p 



(11) 



The component [l + gc(F)] _1 in Eq.(ll) is a field enhancement factor. Tdop and f (T) are expressed 

as: 
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(12) 



Except for N; and T, other parameters in Eq.(12) are all fit parameters. 

Auger recombination rate is formulated in Eq.(9), in which the temperature-dependent 

coefficients C„ and C p are expressed as: 



C,(T): 
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1 + H,e 
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(13) 



Except for T, all other parameters in Eq.(13) are fit parameters. 

Another important physical parameter is the ionization impact generation rate G, and it is 
formulated as G=a„nv n + dppVp, where v n ,p denotes the drift velocity. And a„ /V is described by 
many models, in which vanOverstraeten-deMan model is proved to be the best. In this 
model, a n ,p is formulated as: 
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(14) 



Two coefficients a and b are used for high and low ranges of electric field. And low electric 
field and high electric field are distinguished by a parameter Eo whose default value is 4xl0 5 
V/cm. In low range of electric field below Eo, the values a(low) and fr(low) are applied, while 
in high range of electric field above Eo, the values of a(high) and fr(high) are used. The 
parameter ha p represents the optical phonon energy. 

As the physical model has been chosen, the fit parameters mentioned above should be 
modified. And then the simulation can be carried out. In the simulation, the most difficult 
problem we may face is the convergence problem. Next, convergence problems and 
solutions will be proposed. 

In our simulation practice, it is found out that convergence problems are mostly caused by five 
factors: 1) Not enough iteration times. 2) Bad initial guess. 3) Bad mathematic calculation 
method. 4) Coarse mesh or bad boundary shape. 5) Bad parameter set of physical models. 
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Fig.2 shows the simulation flow of the device simulator. The parameters / "Notdamped" and 
"Iterations", dominate when the simulation will be terminated. Therefore, too small values for 
these two parameters will induce abnormal termination. However, this case rarely happens 
because the default values for these two parameters are big enough in most times. 
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Fig. 2. Device simulation flow 

From Fig.2, it is easy to find that all calculations are based on an initial guess. And a bad 
initial guess will surely induce convergence problem. This case often happens on two 
occasions. Sometimes, the simulation should be divided into subsections, and in some 
regions small value for "initialstep" should be used to obtain a good initial guess while in 
other regions large value for "initialstep" should be used to save time. And a mistaken use 
of large value for "initialstep" may induce the first point failing to converge. To prevent this 
convergence problem, the simulation should be divided into subsections in a reasonable 
way. Meanwhile, large initial voltage imposed on electrodes will also bring on convergence 
problems. Therefore, another simulation method is necessary. We can set the initial voltage 
at the electrode to V, and then ramp the voltage to the value we need. In this way, a good 
convergence will meet. The commands in Fig.3a will cause convergence problems in a great 
probability while commands in Fig.3b always provide good convergence. 
In the snapback region of ESD protection structure, the current increase rapidly. Thus, in the 
simulation, a small AV will induce a large Al which induces the simulation failing to converge. 
Aiming at soling this problem, a particular simulation method is provided in the simulator as 
shown in Fig.4. A series resistor is put together with the ESD protection structure. Therefore, 
the current can be written as: 1= (V ou t -Vmtemai)/R, and in this way, a small Al can be gained, 
which will improve the convergence. In the simulation of ESD events, this method must be 
included, and generally the value for R is set to be larger than 1 xlO 7 Q,. 
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Electrode { 

{ Name="drain" Voltage=0.0} 
{ Name -'source" Voltage=0.0 } 
{ Name="gate" Voltage=5.0 } 
{ Name="sub" Voltage=0.0 } 



(a) 



Electrode { 

{ Name="drain" Voltage=0.0} 
{ Name="source" Voltage=0.0 } 
{ Name- 'gate" Voltage=0.0 } 
{ Name="sub" Voltage=0.0 } 

} 



Solve{ } 

Goal {name="gate" voltage=5.0V} 

(b) 



Fig. 3. (a) Commands hard to converge, (b) Commands with good convergence 
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Fig. 4. ESD simulation method 

Coarse mesh or bad boundary shape will also cause converge problems. Fig.5 shows the 
comparison of a bad boundary shape and a good boundary shape. A sharp-angled region 
can be found in Fig. 5a which will cause convergence problem in the later device simulation. 
It is mainly caused by bad diffusion model and implantation model used in process 
simulation. It is found that pairdiffusion model used for diffusion and implantation tables 
based on Crystal-TRIM used for implantation always provide good boundary shape. 
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Fig. 5. (a) Bad boundary shape, (b) Good boundary shape 

Another reason for convergence problems is the bad parameter set for device simulation. A 
small value for the parameter "a" in Eq.(14) and a large value for the parameter "T max " in 
Eq.(12) may result in convergence problem, the current failed to increase near the 
breakdown region. In addition, a great difference between the values of "a" in low field 
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region and high field region may result the simulation failed to converge after it snapbacks, 
just as shown in Fig.6. When the curve snapbacks, the simulation will change from the high 
field condition to low field condition, and the sudden change of the value for "a" finally 
result in the convergence problem. Therefore, when modifying the parameters, great 
difference between a(low) and a(high), b(low) and b(high) is forbidden. 
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Fig. 6. Simulation fails to converge after the snapback happens 

4. ESD simulation methods 

There are three main methods to simulate the I-V characteristic of the ESD protection device: 
DC simulation, TLP simulation and mixed mode simulation. DC simulation provides the 
fastest simulation speed while it is confronted with the most serious convergence problem. 
TLP simulation method and mixed mode simulation method can both reflect transient 
characteristic of devices. In this section, DC simulation and traditional TLP simulation and 
their limitations will be illustrated. Then a new simulation method based on the traditional 
TLP simulation method is proposed, which can predict key parameters of ESD protection 
devices precisely. Mixed mode simulation will be illustrated separately, which is carried out 
in TSUPREM4/ MEDICI environment, and the method to evaluate the effectiveness, the 
robustness, the speed, the transparency of ESD protection devices is proposed. 
To illustrate DC simulation and TLP simulation method, a traditional LSCR (Lateral Silicon- 
controlled rectifier) shown in Fig. 7 is considered, in which Dl is 1.5 um, D2 is 0.5 um, D3 is 
0.6 um, and D4 is 1 um. Fig.8 is the doping profile which is simulated by DIOS, and the total 
concentration of different layers is shown in Table 2. 
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Fig. 7. A cross section of LSCR 
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Fig. 8. Doping profile of LSCR 
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NWELL 



PWELL 



N+ 



P+ 



Total Concentration lxlO 15 



3.7x1017 



2.6x1017 



5.1x1020 



2.4x1020 



Table 2. Total concentration of varies layers 

Then, the structure obtained from the process simulation is imported into the device 
simulator. And the device simulation can be carried out in two ways. To evaluate the trigger 
voltage (V t i), the holding voltage (Vh), and the second breakdown current (I t 2) precisely, 
selecting proper physical models and parameters is the key point. Table 3 lists the 
parameters modified in the simulation, and the parameters not mentioned in the table 
remain default. The value for parameter a mentioned in Eq.(14) determines Vh, while the 
values for u mentioned in Eq.(l) and t mentioned in Eq.(ll) are crucial for Vh- 



Parameter 



Value 



Value for electron 



Value for hole Mentioned in Eq. 



b(low) 
b(high) 
F 
Cr 



lxlO" 



9.85x10 s 


1.629x106 


Eq.(13) 


9.85x105 


1.354x106 


Eq.(13) 


- 


- 


Eq.(5) 


9xl0«> 


1.5xl0i7 


Eq.(4) 



Table 3. Parameter set in the simulation 

Actually, traditional TLP simulation can not evaluate DC characteristic of ESD protection 
devices, due to the voltage overshoot. Fig.9 (a) shows the current pulse imposed on the 
devices simulated, and Fig.9 (b) shows the corresponding I-V curve, comparing with the 
TLP test result. From Fig.9 (b), we can see that the simulation result deviates from the test 
result a lot. 

DC simulation can evaluate Vh and Vh, but it can not evaluate I t 2 precisely. DC simulation is 
based on the solving of thermal equilibrium equations, but in fact, there is no thermal 
equilibrium established in the structure when the ESD event happens. Therefore, DC 
simulation can no longer evaluate the characteristic of ESD events when the temperature 
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becomes much more than 300K. The non-equilibrium can only be described by a transient 
simulation. Fig.10 shows the result of DC simulation, together with the TLP test result. 
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Fig. 10. Comparison of DC simulation and TLP test result 

To evaluate the performance of ESD protection devices, Vtl, Vh, and It2 are all 
indispensable. Based on traditional TLP simulation, we propose a novel TLP simulation 
method, which can simulate all of the three parameters precisely. Firstly, we should make 
sure that this method can evaluate Vu and Vh. As the novel TLP simulation begins, series of 
current pulses are imposed on the structure as shown in Fig. 11 (a). The obtained voltage vs. 
time curves are shown in Fig.ll (b). Then average current value in the range of 70%~90% 
time for each I-t curve is calculated, and so is the average voltage value, the same as the TLP 
measurement works. Then each pair of voltage and current is plotted as a point in Fig. 12. 
After connect these points together, comparing it with the tested results, it is found that they 
meet very well. 

Table 4 lists the TLP test results and simulation results with DC simulation method and the 
novel TLP simulation method. We can see that DC simulation method and the novel 
simulation method provide almost the same result in terms of evaluating Va and Vh. 
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Fig. 11. (a) Series of current pulses are imposed on the structure simulated, and average 
currents of the 70% -90% section of each curve are calculated, (b) Voltage vs. time curves are 
obtained from the simulation. And the average voltage of the 70%~90% section of each 
curve is calculated. 
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V tt (V) 



Absolute Relative 
error (V) error 



V h (V) 



Absolute Relative 
error (V) error 



TLP test 
Novel 
simulation 
DC simulation 



TLP 



16 

15.69 

15.69 



0.31 
0.31 



- 


2.16 


- 


1.94% 


2.03 


0.13 


1.94% 


2.02 


0.14 



6.02% 
6.48% 



Table 4. Test result and simulation results 

To evaluate I t 2, current pulses whose peak values are 0.04A, 0.05 A, 0.06A, 0.066A, 0.068A, 
0.07A, 0.08A, 0.09A are imposed on the structure, and several points obtained from 
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simulation, together with the points obtained before, the whole curve is shown in Fig.13, 
from which we can see that that as the current arrive 0.066A, the voltage comes back. And 
this current is treated as la. 
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Fig. 13. la obtained from novel TLP simulation and that from TLP test 

We can also evaluate la by the maximum temperature in the structure, as thermal 
breakdown is caused by high temperature ultimately. After the simulation, we can obtain 
Tmax vs. time curves, as shown in Fig.14. When the maximum value of T max exceeds the 
melting point of Si (1687 K), it can be judged that thermal breakdown happens. From Fig.14, 
we can see that la is about 0.064 A. 
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Fig. 14. Maximum temperature in the structure vs. time curves when series of current pulses 
are imposed on the structure. 

Table 5 lists the test result, the result simulated with the novel TLP simulation method and 
judged by the voltage's snapback, and the result simulated with the novel TLP simulation 
method and judged by the maximum temperature in the structure. 
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I l2 (A/um) 


Absolute error(A/um) 


Relative error 


0.068 


- 


- 


0.066 


0.002 


2.94% 


0.064 


0.004 


5.88% 



TLP test 

Judged by voltage's snapback 

Judged by maximum temperature 

Table 5. Test and simulation results 

From the discussion above, we can conclude that the most effective and fastest way to 
evaluate the performance of ESD protection devices is to evaluate V t i and Vh with DC 
simulation method, and evaluate I t 2 with the novel TLP simulation method introduced 
above. 

Next, the mixed mode simulation method is introduced, taking the CDM model for 
example. The equivalent circuit of CDM model is shown in Fig.15. The device to be 
evaluated is a MLSCR, as shown in Fig. 16, and the doping profile gained by simulation with 
TSUPREM4 is shown in Fig.17. 
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4.1 Effectiveness evaluation 

From the current vs. time curve gained from the mixed mode simulation, as shown in 
Fig. 18, we can see that the ESD current is completely released through the device in 2.5 ns. 
This time and the peak current at the Ti ma x point reflect the effectiveness of the device. 
Smaller value of the time and larger peak current mean that the device can release larger 
current in smaller time, in other words, the device is more effective. 

Transient Current V.S. Time of CDM Models 




0.5 1 1.5 2 2.5 

Time ( Units= Nanosecond ) 

Fig. 18. Current vs. time curve 

4.2 Speed evaluation 

From the voltage vs. time curve shown in Fig.19, we evaluate the speed using the recover 
time. The recover time is defined as the time that the device voltage quickly rises and then 
returns to the normal working voltage, which is described as the T reC over in Fig.19. The 
smaller value of T reC ovet shows that the ESD protection device can make faster reaction to the 
electrostatic signal. 



,—.60 






03 












O 






> 50 


-\ 


- 


•s> 


I 










'__ 






ZD 40 


- 


- 








CD 


I 




CD 






ra 30 












~o 


I 




> 


I 




-£20 


- \ 


- 


CD 


\ Trecover=3.43E-10S V(Anode)=5V 




if) 






§10 


\ / 


- 








r— 


-m 


--—- ________ 









0.5 

Time 



1 1.5 

( Units= Nanosecond ) 



Fig. 19. Voltage vs. time curve 



208 



Advances in Solid State Circuits Technologies 







PmaxV.S. Time of CDM Model 


CO 


T5 




E 3 
o 


T6 


- 


3 " 






01 

5 2 




- 


£ 






- * 






— 

1 

a. 




- 


ro 










T7 




05 1 1.5 3 2 



Time (Units-nanoseconds) 

(a) 




Fig. 20. (a) Pmax-t, (b) Rectangular box heat source model (Zoom out), (c) Rectangular box 
heat source model (Zoom in) 

4.3 Robustness evaluation 

There are mainly two aspects should be considered when evaluating the robustness: the first 
one is to inspect whether the electro thermal characteristics become uncontrollable, when 
the instantaneous power of ESD comes to the maximum (P max ); the second one is to inspect 
the power distribution in the ESD protection device when the ESD event happens. Taking 
advantage of the P m ax-t curve in Fig. 20 (a) and the rectangular box heat source model of 
Ajith Amerasekera, a modified rectangular box heat source model is proposed to evaluate 
the robustness of the SCR protection device. In the modified model, the power is supposed 
to be concentrated in a cuboid whose three side lengths are a, b and c respectively, as shown 

in Fig. 20 (b) and Fig.20 (c). Define Pnormaiized(t) as ( J P max (f)5f )/t, the power instilled into 

the SCR device is P(t)=abcR(t)P n0 rmaiized (t), where R(t) is a fitting parameter (0<R(t)<l), and 
R(t)P normaiized(t) is the average power density of the rectangular source heat source. The 
relationship between the temperature difference AT(t) (at this time, the highest temperature 
T m ax = T0+AT, TO is the initial temperature, T max is the highest temperature) and P(t) is a 
subsection function depicted in equations (15) to (18): 



pabcC„AT 
P = - '- (0<t<t c ) 



(15) 
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ab 



f** f 



AT 



lt-f c /'- 



(t c < t<t„) 



(16) 



AnKaAT 



lo g ,(t/t b ) + 2-c/b 



&**<*«) 



(17) 



InKabJ 



\o ge (a/b) + 2-c/2b-Jtjl 



(t*t.) 



(18) 



In these equations, K is the thermal conductivity, C p is the specific heat capacity, D= K/ pC p , 
p is the density of silicon, t c =c 2 /4jiD, tb=b 2 /47iD , t a =a 2 /47iD, and K, C p , and pis dependent 
on the process. Therefore we can calculate the highest temperature at every time point, and 
then calculate the heat produced carriers na caused by highest temperature. If nd extends the 
background impurity concentration, the robustness of this device cannot meet the need. The 
transform equation is depicted in Eq.(19): 



n rf =1.69*10 19 exp( 

I max 



-6.377x10 . T max 3/2 



)•(- 



300 



-r 



(19) 



The method to estimate whether the device enters electro thermal uncontrollable condition 
through the curve of P m ax-t, as mentioned above can also be quickly implemented by 
mathematic project software such as Matlab. 

The inside power distribution profiles of the ESD protection device when ESD event 
happens can reflect the robustness of the device. An ESD protection device with strong 
robustness should spread the inner power as dispersive as possible, especially when the 
power extremum is very large. Fig.21 shows the power distribution when the power comes 
to its peak. 
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Fig. 21. The power distribution when the power comes to its peak 
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4.4 Transparency evaluation 

We can inspect the leak currents on to 1.2 VDD bias voltages when evaluating DC 
transparency (depicted in Fig.22 (a)). We need to inspect the leak current under I/O signal 
frequency when evaluating the transparence of AC signal. (Take 100K rectangular wave as 
example, see Fig.22 (b)). The leak current under frequency signal is larger than that under 
DC voltage, which is mainly caused by high frequency couple effect. 
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Fig. 22. (a) DC leakage current of the SCR-based ESD protection device, (b) Leakage current 
of the SCR-based ESD protection device under 100K frequency signal 



4.5 Overall evaluation 

At the last, we can obtain the transient curve [I(t),V(t)] which describes the entire ESD event 
as shown in Fig. 23, from which we can make a comprehensive evaluation on the 
effectiveness, speed, robustness and transparency of the ESD protection device. TO < T3 = T5 
< T6 < T7 < Tl < Trover < T4 < T2. The current value at Tl reflects the effectiveness of the 
ESD protection device. T reC over reflects the trigger speed of the ESD protection device. The 
hyperbola family in this figure represents the power of the ESD protection device, and the 
distance from the hyperbola family to the origin reflects the robustness of the ESD 
protection device. Besides, the power density extremum also reflects the robustness of the 
ESD protection device. When time is 1E-11 S, the max power density of the device comes to 
the peak. The current when the device first comes to 5V in an ESD event reflects the 
transparency of the ESD protection device. An ideal transient curve of an ESD protection 
device should be close to the vertical axis with most of the points staying on the left of the 
line V=VDD. 
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Transient l-V Curve of CDM Model 
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Fig. 23. Ransient I(t) versus transient V(t) of SCR-based ESD protection device 

5. ESD protection element characteristic evaluation based on SPICE 
simulation 

5.1 SPICE Simulation based design-transient power clamp 

As technology is scaling down, the gate oxide is shrinking and becoming more vulnerable to 
ESD. The resistance of the routing rail metal increases apparently with the technology 
advances. Traditional rail-based static ESD power clamp protection (Fig. 24) is more 
challenge. Transient power clamp, which consists of a RC network based detection circuit 
and the main ESD device NMOS (Fig. 25), is becoming more and more attracting for their 
fast turn-on speed and low turn-on voltage. The key advantage of the transient power clamp 
is the capability with the SPICE simulation, which enables the optimization in the pre-silicon 
phase. A major drawback of the transient power clamp is the large RC network, needed to 
trigger the main protection device, will response any fast event on the power rails. 
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Fig. 24. Rail-based ESD protection scheme with power clamp 
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Fig. 25. Three-stage inverter based transient ESD power clamp 

The transient power clamp uses the RC network to detect the ESD event and turns on the 
main ESD protection device NMOS (Fig.25), to shunt the ESD event on the supply pin. The 
main NMOS conducts the ESD current through the channel and this can be simulated in the 
SPICE. As the peak current of the HBM is around the orders of amperes, the main NMOS 
needs to be large enough to shunt the ESD current safely. It is always about millimeter. In 
normal condition, the gate of the NMOS is low and the main protection device is off. The 
rise time of ESD event is between lOOps and 60ns. However, the rise time of power up is 
about millisecond range. In order to keep the main protection device on, the RC constant is 
set to larger than the duration of the ESD event, which is about lp,s for HBM ESD stress, and 
shorter than the rise time of power on. The typical value of RC time constant is l|is. The 
large RC time constant not only consumes large silicon area but also leads susceptibility to 
the power bus noise. 




Fig. 26. Proposed three-stage inverter based ESD power clamp with feedback 

The MO is the main protection NMOS to shunt the ESD current.Ml~M6 consist of the three 
stage inverter. The signal at the node VI transfers through the three stage inverter to control 
the gate of main device MO. M8-M10 consist of the resistor Mil is the NMOS capacitor. M7 
is the feedback NMOS and R is the pull-down resistor. In normal conditions, the node VI 
charge up to VDD and V2 is low. The pull-down resistor R confirms the node to couple to 
VSS. This ensures the feedback NMOS is in its off state. And the voltage at node V2 transfers 
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through two stage inverter to ensure the node V4 is Low. And the MO is in off. The low 
voltage at the node V4 enables the reduction in the leakage of MO. In ESD conditions, 
because of the RC delay, the voltage at the node VI is low. The M5 is on and the node V2 is 
charge to VDD. The high voltage in V2 enables the feedback NMOS M7.The M7 pulls the 
node of VI to VSS. And the low voltage at the node VI enhances the pull-up of the POMS 
M5.The high voltage at node V2 transfers through two stage inverter and enables the 
MO. The main protection device MO shunts the ESD current. The feedback significantly 
increases the time to keep V4 in high voltage. So the RC time constant can be reduced 
significantly which translates into reduction in the silicon area. The most advantage is the 
smaller RC time constant reduces the susceptive to the fast transient event on the power 
lines. In the design, the specific dimension of the RC network is list in Table 6. 



Device 


Dimension 


M8 


W/L=7.12um/0.4um 


M9 


W/L=7.12um/0.4um 


M10 


W/L=7.12um/0.4um 


Mil 


W/L=1.4um/3.5um 



Table 6. RC network device dimension 

The power clamp is simulated in the Cadence Specture environment. A simplified RC 
network (Fig.27) is to simulated the HBM ESD event. The switch SW1 and SW2 are voltage 
controlled switch. When SW2 is on and SW1 is off, the CI is charge through the voltage 
source V2 before Ins. After Ins, the switch SW1 is on and SW2 is off, the capacitor discharge 
through the 1.5k resistor R2 to the power clamp. 



-Wv- 



SW1 



D' 



1.5 k 
==100p 



SW2 7.5nH 



DUT 



Fig. 27. The simplified RC network to simulated HBM ESD event. 

The simulated result of the transient power clamp under a 5kV HBM ESD event in 90nm 
process is shown in Fig.28.The width of the main protection device MO is 3000|im. The 
breakdown voltage of gate oxide for 1.0V core device is about 5V in DC condition. The 
transistor in the power clamp is 1.8V devices to reduce the leakage. The breakdown voltage 
of gate oxide for 1.8V device is about 9.5V in DC condition. From the simulated results, the 
voltage at the gate of the MO is smaller than the breakdown voltage 9.5V. And the NMOS 
keeps on state at almost l|is.The voltage at the VDD rail is also smaller than 9.5V.The NMOS 
can safely shunt the 5KV HBM ESD current. 

To evaluate the immunity to the fast transient, a fast power on 100|is pulse with a rise time 
of 10(is and a fall time of 10(is is applied at the power clamp. The pulse voltage is 1.8V. The 
voltage response is shown is Fig.29.The peak voltage at node 4 is 0.05V and it keep almost 



214 



Advances in Solid State Circuits Technologies 



OV at most time. So the main NMOS in is off state. And the power clamp is immunity to the 
fast transient power on. 
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Fig. 28. Simulated voltage at the different node under 5KV HBM ESD event 
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Fig. 29. Simulated voltage at the different node at fast power on state 

TLP like pulse with rise time of 10ns and fall time of 10ns and pulse with 100ns is stressed at 
the power clamp. The pulse voltage is 1.8V. The results are shown in Fig.30. The voltage at 
node V4, which transfers after three-stage inverter, is a square like pulse. This ensures the 
main NMOS is on in the pulse width and can shunt the ESD current safely. 
The SPICE simulation based transient power clamp is compatibility with the normal SPICE 
simulation. This enables an early optimization phase in a pre-silicon state. The transient 
power clamp responds to any fast transient event. An example of the transient power clamp 
is introduced in the 90nm CMOS process to show the design flow. The susceptibility to fast 
power on issue is addressed in the example. From the simulation result, the power clamp 
can achieve a level of 5KV HBM ESD without suffering mistriggering from fast power on. 



Advanced Simulation for ESD Protection Elements 



215 



2. On 
























































































































e LB . 












T T 


i 










tVI 


\ 












#V2 


\ 




r ?J 










— V3 


\ 














#V4 














+ VDD 




* \ 














t 














\ t 












J^ 












V 


-.25- 













50.0 
Time (ns) 



Fig. 30. Simulated voltage at the different node at TLP like pulse 

5.2 Triggering characteristic evaluation 

SCR is an efficient ESD protection device in integrated circuit area. In order to estimate the 
ESD device performance, including trigger voltage (Vtl), holding voltage (Vh), failure 
current (It2), a lot of research are spent base in TCAD simulation. However, a precise 
evaluation method does not exist as the high ESD current model is not support in spice 
model. Therefore, a desirable technique is in need to evaluating the ESD device performance 
in ESD protection device design process. In this section, a new technique is proposed to 
evaluate the trigger voltage of SCR base in spice simulation. 



5.2.1 SCR triggering characteristic evaluation 

The equivalent schematic of SCR is showed in Fig. 31, which consists of Bipolar junction 
transistor PNP and NPN. The left part of Fig.31 is an ESD voltage pulse generation circuit. 
There are different ways to trigger a SCR, including voltage-triggering by slowly stepping 
up Vac(voltage of anode to cathode) or using a dV/dt transient, and current-triggering by 
injecting seeding currents from the base of PNP or NPN. A current source is employed to 
regard as the base current of NPN when the SCR occurring avalanche breakdown. The SCR 
will turn to latch up state once the base current reaches a value which induces the inside 
feed back of SCR occurring. The simulation results are showed in Fig.32. As Fig. 2 shows, the 
SCR reaches latch up state when the base current of NPN is 1.3mA. 




Fig. 31. ESD voltage pulse generation circuit and equivalent schematic of SCR 
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Fig. 32. Simulation results of normal SCR triggering characteristic 

5.2.2 Darlington SCR triggering characteristic evaluation 

Increasinig the common-base current gains pof PNP and NPN can make for reducing the 
trigger voltage of SCR. A Darlington SCR configure is showed in Fig.33. The Q2 and Q3 
form to a Darlington transistor, which equates to a NPN transistor here. A current source is 
also employed to emulate base current as above SCR simulation. The simulation results are 
showed in Fig. 34. SCR turns to latch up state when the base current achieves 0.37mA which 
is almost one third of normal SCR. In other words, the Darlington configured SCR needs less 
base current to trigger the SCR into latch up and, therefore, low breakdown voltage to keep 
the NPN operation. The triggering characteristics of normal SCR and Darlington SCR are 
showed in Fig. 35 when the base current of NPN is 0.37mA. 
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Fig. 33. ESD voltage pulse generation circuit and equivalent schematic of Darlington SCR 
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Fig. 34. Simulation results of Darlington SCR triggering characteristic 
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Fig. 35. Trigger characteristic comparison of normal SCR and Darlington SCR when the base 
current is 0.37mA 
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1. Introduction 

Inductive Power Transfer (IPT) systems have successfully been developed and used to 
replace traditional conductive power transfer systems where physical connection is either 
inconvenient or impossible, such as biomedical implants, undersea vehicles, and contactless 
battery chargers of robots, for providing power to movable or detachable loads (Kim et al., 
2001; Feezor et al., 2001; Harrison, 2007). As IPT systems extend to more fields, better control 
methods are required to cope with various operating environments to satisfy users' needs. 
Difficulties in controlling the power flow in a wireless/ contactless power pickup using IPT 
technologies can arise from several factors, which include but not limited to load and circuit 
parameter variations, magnetic field coupling variations between the primary and 
secondary coils, the operating frequency drift of the primary power supply, etc (Jackson et 
al., 2000; Chao et al., 2007). These factors can cause the output voltage of the secondary 
power pickup to deviate significantly from the original designed value, resulting in an 
undesirable characteristic for applications where a stable output voltage is required. Hence, 
there is a need to develop controllers under various operating conditions. 
Practical power flow control of an IPT sytem can generally be categorized into three 
different types: namely, primary power supply control, secondary power pick-up control, 
and coordinated control of both primary and secondary circuits. Among these three, direct 
power flow control at secondary power pickups is most commonly used to stabilize the 
output voltage, paricularly for multiple power pickup applications (Hu et al., 2007; Wang et 
al., 2006; Gao, 2005). This chapter presents the basic theory and control algorithm of an 
improved directional tuning control method for power flow control of secondary 
contactless/ wireless power pickup circuits. 

2. Background of Inductive Power Transfer (IPT) system 

The basic structure of an IPT system is shown in Fig. 1 (Wang et al., 2000; Wang et al., 2005; 
Bieler et al., 2002). The system comprises two electrically isolated parts: the primay power 
supply and the secondary power pickup. The primary power supply is normally stationay 
and consists of a resonant power supply and an elongated conductive path for producing a 
constant AC track current. The secondary movable part, also called the power pickup, is 
mutually coupled with the primary track and moves with respect to the track loop as the 
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operation requires. Since the system is often loosely coupled between its primary and 
secondary side, the induced voltage source is usually unsuitable for direct use in 
applications. As a result, proper tuning and control are essential in the system design for 
providing a constant DC voltage to the load. 



Pickup Tuning 
Circuit 



Power 
Input 



Pickup Coil 




Power Converter 





Magnetic Coupling 



_^h 



AC Current 



' Track Loop 

Fig. 1. Basic structure of an IPT system with uncontrolled power pickup. 

Figure 2. shows the structure of a typical IPT power pickup. Ls and Cs represent the 
secondary pickup coil inductance and tuning capacitance respectively, a parallel tuning 
configuration is adopted here for boosting the induced open circuit voltage. 




C s 4= V AC 



* 



s C DC 



Controller 
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Fig. 2. Basic structure of an IPT power pickup with shorting-control. 

The open circuit voltage Voc and short circuit current (Isc) of the pickup coil are governed 
by the following equations: 



v oc 



- jmMl i 



(1) 



l sc 



v OC/ 



J«>L S 



(2) 



In Figure 2 the voltage Vac after tuning is converted from AC to DC through rectifiers to 
provide a DC output voltage Vout- To simply the analysis, the rectifier and load can be 
represented with the equivalent AC resistor Rac- The transfer function of the system given in 
(3) can be derived from the simplified second order system shown in Fig. 3, and it also can be 
seen that at steady state the pickup provides a current source to the load when it is fully-tuned. 
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Fig. 3. Simplified second order tuning circuit of power pickup. 
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where a is the system operating frequency. The maximum voltage boost-up factor of power 
pickup is gorverned by Q factor of the tuning circuit, and under fully-tuned condition it can 
be expressed as: 



AC 



oc 
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AC 
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R AC^ 

R AC 
coLc 



;)f+(^s) 2 



. ¥ a L s C s =l 



(4) 



(5) 



The AC equivalent load resistance Rac is given by: 



R^=—R, 



(6) 



where Rioad is the DC load resistance. A DC inductor Ldc is normally added after the rectifier 
to maintain a continuous current flow, so that the available power of the secondary pickup 
can be fully delivered to the load. The output voltage regulation is normally achieved by 
using a well-known control technique called "Shorting-Control" (Boys et al., 2000; Elliot et 
al., 1995; Raabe et al., 2007). Its working principle is similar to a boost converter. The 
constant output voltage is maintained by controlling the average current flowing through 
the load by switching a semiconductor device (S, shown in Fig. 2) on and off using either 
hysteresis or PWM control. However, this controller cannot maintain the full-tuning 
condition of the secondary power pickup circuit. Therefore, the maximum power which can 
be transferred may be significantly reduced if the circuit parameters vary. And due to the 
fact that the short circuit current of the pickup coil has to flow through the switch during 
shorting period, which causes high power losses particularly under light loading conditions, 
this shortcoming also decreases the potential capability of the primary power supply to 
operate with more pickups due to unnecessary power loss and possible circuit mistuning. 
An alternative method that has been investigated to further improve the power flow control 
is the dynamic tuning/ detuning technique (Hu et al., 2004; Si et al., 2006). Figure 4 shows 
the general structure of dynamic tuning/ detuning control scheme. The fundamental concept 
of this control method is to dynamically change the tuning condition of the power pickup 
according to the actual load demands. This helps to maintain maximum power transfer 
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Fig. 4. Basic structure of an IPT power pickup with dynamic tuning/ detuning control. 

capacity, improve the overall efficiency of the system under light loading condition while 
keeping the output voltage to be constant. The control strategy is achieved by using a PI 
controller to control the on/ off time of a soft-switched tuning inductor/ capacitor to obtain 
the desired values of equivalent inductance/ capacitance in the resonant tank. However, 
because the relationship between the tuning components and the output voltage is bell- 
shaped (shown in Fig. 5), there are two possible operating points with one in the over-tuned 
region and the other in under-tuned region. If the operating point has been accidentally 
shifted to the other region due to variations of circuit parameters, the desired equivalent 
values may be tracked in the wrong direction and consequently fail to control the output 
voltage. 

To overcome the problems associated with existing control methods of power pickups such 
as shorting control, dynamic tuning/ detuning control, etc., an LCL (Inductor-Capacitor- 
Inductor) based power pickup with directional tuning control (DTC) algorithm is proposed 
and has been discussed in detail in this chapter. Its working principle is similar to the 
dynamic tuning/ detuning control technique. However, instead of using the traditional PI 
controller to perform the tracking process, it uses the present and previous control results to 
determine the correct tracking direction in the next step, and retune the circuit to deliver the 
required power (Hsu et al., 2006). Such an approach covers the full-tuning curve, so dual- 
side (full-range) control can be achieved. The proposed controller can provide reliable 



. Resonance Point 

^^£ Possible Operating 


yr yS^^C — 7 Points 


Under-Tuned Region Over-Tuned Region 


1 ■ ► 



Inductance 
(or Capacitance) 

Fig. 5. Relationship between tuning inductance/ capacitance and output voltage of IPT 
power pickup. 
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constant output voltage under various circuit parameter variations, thus eliminating the 
need for tedious fine-tuning process required by traditional IPT pickups. As a result, it is 
more cost-effective for mass production with reduced tuning and component tolerance 
requirements. 

3. Effects of power pickup parameter variations on output voltage 

In practical operations, the pickups are often deviated from its designated operating point 
due to the variation of circuit parameters. Since the deviation of output voltage may not be 
regulated by the general controller, especially under full-tuning range, the effect of each 
parameter variation on the output voltage is therefore need to be individually examined so 
the control range based on the given maximum tolerance to pickup parameters can be better 
understood (Hsu et al., 2007). The considered circuit parameters include: system operating 
frequency, magnetic coupling between the primary and secondary side, load resistance and 
tuning capacitance. Figure 6 shows the structure of the proposed secondary power pickup. 
An LCL tuning configuration is being used here to provide a constant output voltage to the 
load under resonant conditions, and a magnetic amplifier in the tuning circuit serves as a 
variable inductor for changing the tuning condition of the power pickup. The DC current 
(Ima) which controls the magnetic amplifier is varied through a transistor operating in linear 
mode which essentially functions as a variable resistor. The equivalent inductance of Ls2 is 
adjusted through changing the output signal Vm from the DTC algorithm, which allows the 
power pickup to deliver the right amount of power required by the load (Hsu et al., 2009). 
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Fig. 6. The proposed LCL power pickup with directional tuning control. 

The boost-up factors for ac voltage (Vac) and current (Iac) of the LCL tuning circuit can be 
determined from the following two transfer functions. 
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u _ I ac _ s Ls\ Vac / a \ 

1 sc K v oc 

As shown in Fig. 6, the value of Cst can be separated into Csi and Cs2 which resonate 
respectively with Lsi and Ls2 i.e. joLsiCsi= j<x>Ls2Cs2=l. The ac voltage boost-up factor k, 
under full resonant condition can be expressed as: 



As 2 _ Qi 
Voc L S1 C S2 



k =^£C = i^2_ = ^5L ( 9 ) 



With the considered circuit parameters, the magnitude of AC boost-up factor k v in (7) can be 
further expressed as: 



a v a fCt r k r R 



^\a r R[a f 2 a c {k r +l)-k r \] + \(o[k r L si - [a/a c {k r +l)-k r \L 



(10) 



sif 



where a v , a r , a/, and a c is the per unit variation of open circuit voltage, load resistance, 
primary operating frequency, and tuning capacitance, respectively and these are equal to 
unity when they are at their nominal values. For example if the open circuit voltage 
increases or decreases by 10%, the value of Ou is set to 1.1 or 0.9 respectively. By rearranging 
(10) into a quadratic equation of Ls2, the solution can be obtained as: 

kL a,.Rj (a v a f k r ) 2 -[k mki [a f 2 a c (k r +l)-k r } 

L S 2=— — ± " : 1— 2 : \ (8) 

a,f a c (k r +l)-k r mk min yaf a c (k r + 1) - k r J 

where k„,i„ is defined as the required minimum ratio between Vac and Voc, reflecting the 
required AC voltage boost-up capability under all possible variations in a v , a r , a/, and a c . 

3.1 System operating frequency variation 

Depending on the design of primary power supplies, the operating frequency may drift 
which often causes significant power loss due to the mismatch in the resonant frequency 
between the primary and secondary sides. This is particularly a major concern in wireless 
power transfer systems using resonant variable frequency converters. 

Figure 7 shows the effects of system operating frequency variation on AC voltage of the 
power pickup. It can be seen from the graph that the operating frequency is drifted with the 
variation so the tuned-point (T-P) is shifted accordingly. As for the magnitude of Vac, it is 
also changed due to the tuning circuit requires different value of Ls2 to achieve resonant 
condition and therefore resulted in various k r . Note that there are two possible operating 
points for Ls2 to compensate for the variations, and both of them are able to keep Vac 
constant. However, depending on the design specifications, designer can choose to either 
work with the lower or higher inductance point. 
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Fig. 7. The effect of system operating frequency variation on AC voltage of LCL power 
pickup. 

3.2 Magnetic field coupling variation 

The IPT system is normally involved in loosely coupled applications which allow free 
movements between the primary and secondary sides. In such applications, fluctuating 
open circuit voltage of the pickup coil is usually caused by coupling variations due to the 
free movements, and hence it needs to be compensated for keeping the output voltage 
constant. 

Effect of the magnetic field coupling variation on AC voltage of the power pickup is shown 
in Fig. 8. It can be seen that the tuned-point and shape of the tuning circuit have both 
remained the same. Only the magnitude of open circuit voltage of the pickup coil has been 
changed and therefore resulted in different peak value of Vac- 
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8. The effect of magnetic coupling variation on AC voltage of LCL power pickup. 
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3.3 Load resistance variation 




R+30%Vari. 

Nominal 

R-30%Van. 



Fig. 9. The effect of load resistance variation on AC voltage of LCL power pickup. 

Another variable whose effects need to be studied is the load resistance which varies as the 
loading condition changes. Fig. 9 shows the effect of load variation on Vac- It can be seen 
from Fig. 9 that when the load increases, the sensitivity of Vac with respect to Ls2 decreases. 
On the contrary, when the load decreases, Vac becomes very sensitive to the change of Ls2- 
These two results have indicated that when the power pickup is operating at extreme 
loading conditions, either Ls2 will not be able to compensate for the variation, or the tuning 
circuit will be too sensitive with respect to Ls2- 

3.4 Tuning capacitance variation 

Unwanted variations of the tuning capacitor such as the variation caused by temperature 
change may result in undesired tuning condition change and affect the output voltage. This 
is particularly severe when the seondary system is working with high Q factor since the 
circuit becomes extremely sensitive to parameter variations. 

Similar to the operating frequency variation, both the magnitude of peak Vac and the T-P 
have been changed and shifted to different places after the variation as can be seen from Fig. 
10. Note that as the tuning capacitance decreases/ increases, the corresponding Ls2 also 
needs to be increased/ decreased to keep the circuit tuned, and this consequently causes the 
pickup to have different peak Vac (or k r ). 



3.5 Determination of range of the tuning inductance 

In practical operations, the system operating frequency, magnetic coupling, and load 
resistance as well as other parameters may vary simultaneously. To design the variable 
capacitor and its controller properly, the worst-case maximum and minimum values of Ls2 
should be identified based on the integrated effect of concerned parameter variations to 
cover the full control range. Given the maximum allowed tolerance for each variation, the 
desired maximum and minimum inductance can be calculated by using (8), with the 
following conditions: 
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Fig. 10. The effect of tuning capacitor variation on ac voltage of power pick-up. 

1. Maximum Inductance 

• Open circuit voltage, operating frequency, tuning capacitor, and load resistance are 
all at Nominal value - maximum allowed tolerance. 

2. Minimum Inductance 

• Open circuit voltage, operating frequency, tuning capacitor, and load resistance are 
all at Nominal value + maximum allowed tolerance. 

The method presented here can be extended to other possible parameter variations in the 
system for calculating the range of Ls2 in worst-case scenario. 

4. Design of Directional Tuning Control (DTC) algorithm 

In both the shorting-control and dynamic tuning/ detuning control method, traditional PI 
controller has been employed for their output voltage regulation and proven to be effective 
when the power pickup operates under single-side tuning condition. Nevertheless, it is 
practically difficult to maintain single-side operation, particularly for high Q systems. The 
system parameter variations may force the pickup to traverse from one operating region to 
the other region of the tuning curve and fail to control the output voltage. Directional 
Tuning Control (DTC) algorithm has been proposed to overcome the problems associated 
with full-range tuning of the power pickup. The fundamental concept of DTC is based on 
comparing the present value of control input with its immediate past value, and then use 
this result to determine the next control action. Instead of depending only on the output 
error detection as the traditional controllers do, the proposed controller generates the 
control signal based on the memory of previous control action following the procedure 
outlined in the flow chart of Fig. 11. 



4.1 Standard procedure of DTC algorithm 

The flow chart of DTC algorithm is shown in Fig. 11. Standard procedures of the DTC 
algorithm start with initializations. In this process, the controller initializes the settings 
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according to the user specifications, which include sampling time of the controller and 
initial state of each processing block. Since the algorithm is designed for controlling the 
power pick-up to focus on the steady state control, variation of the circuit time constant 
caused by other system parameter variations must be specified in the initial time delay of 
the program to avoid inaccurate sampling. After the initialisations, the output voltage at 
present-state Vout* will be sampled, stored, and used to compare with a voltage reference 
V re f and its previous stored value Vout*" 3 for generating logic signals Si(k) and Sz(k), 
respectively. These control signals are then collected by the next processing block to check 
with a predetermined truth table (Table 1) for determining the next-state control signal Si(k). 
Note that the memory block after the decision block stores the present control signal as Ssik), 
so it can later be used in the next execution for validity checking of the present control 
action. 



Initialisation 



I 

Sampling 



If V OUT k > V ou f 
SJk) = 1 



Else 



S^k) = 



Sffl 



If Vour > V ref 

Si(k) = 1 
Else 

S,(7cj = 



Updating present- 
state logic 



Inc. or Dec C s 

according to 

decision making 

truth table 



Si(k) 



S 4 (k) 



Memory block 

of present 
control signal 



S 3 (k) or S 4 (k-1) 



Fig. 11. Flow chart of the directional tuning control algorithm. 
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Table 1. Truth table for Ls2 increasing direction determination. 

The simplified Boolean expression corresponding to Table I and the actual output signal of 
the controller can be expressed by: 



-S 3 (s l ®S 2 ) + S 3 (s l ^S 2 ) 



(9) 



U(k) = U(k-l)+(-l) Si+1 -Ah 



(10) 



where U(k) is the present-state control signal, U(k-1) is the previous-state control signal, and 
Ah is the step-size of the adjustment. 

4.2 Fuzzy logic control for automatic selection of tuning step-size Ah 

Despite the fact that the DTC algorithm can effectively control the output voltage of the 
pickup, the control quality is still restrained by the predefined tuning step-size. A larger step 
change in the inductance often causes chattering of the output voltage. Although the 
chattering effect can be reduced by using smaller step change in the inductance, it causes the 
overall response to be sluggish. To overcome the difficulties associated with the chattering 
problems and to make the overall response fast, a fuzzy logic controller is integrated with 
the classical DTC algorithm to further improve the performance of the controller (Hsu et al., 
2008). The objective of the fuzzy logic controller is to dynamically determine the step change 
Ah of the tuning inductance in (10). 



4.2.1 Fuzzification 

Design of the fuzzy controller consists of fuzzification, formulation of control rule base, and 
defuzzification. In the process of fuzzification, operating region of the controller is designed 
to allow error and rate of error to lie inside a predetermined interval (-L, L). The inputs to 
the fuzzy PI controller are given as: 



GE ■ e(n) = GE ■ \y r {n) - y(n)\ 
GRr 1 (n) = GR- [e(n) -e(n- 1)] 



(11) 
(12) 



GR • r 2 (n) = GR4 e{n)\ -\e(n- 1) 



(13) 



where y(n) is the output voltage, y r (n)is the reference signal, e(n) is the error signal, GE and 
GR are scaling factors for the error and the rate of error respectively. Since the rate of error is 
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calculated from values of output voltage at two consecutive sampling instances i.e. n and n- 
1, the rate of error r(n) has been further separated into two different variables ri(n) and riin), 
where ri(n) represents the rate of error when the output voltage at both these sampling 
instances i.e. y(n) and y(n-l) lie either above or below the reference value and tiin) 
represents the rate of error when the output voltage at these two instances lie in different 
regions with respect to the reference value. The membership functions for error positive (e p ), 
error negative (e n ), rate positive (r p ), and rate negative (r„) can be claculated from the 
following expressions: 

L + GE-e(n) L-GE-e(n) 

»« = 2L ' "» = 2L (14) 



L + GR-r(n) _L-GR-r(n) 

~2~L ' Mrn ~ 1L 



L/TUJV Mill J-/ WJl l\llf ,_, _ . 

Ihp= ^A V r „= ^T^ 2 - ( 15 ) 



However, a simple fuzzy PI controller will fail to eliminate the chattering effect at the output 
voltage since the positive and negative errors calculated using (14) could be the same and 
cancel out with each other. Therefore a D controller is introduced here with a new set of 
inputs given by: 

GD ■ y d (n) = GD-\y r («) - y(n)\ = GD ■ | e(n)\ (16) 

GM ■ Ay(n) = GM-\ y(n) - y(n - 1)| (17) 

where yd(n) is the absolute value of the error, Ay(n) is the absolute value of the rate of 
output, GD and GM are scaling factors for the absolute error and the absolute rate of output 
respectively. The membership functions for absolute error large {yd!), absolute error zero 
(i/rfz)/ absolute rate of output large (Ayi), and absolute rate of output zero (Ay z ) are given as: 

_ GD-y d (n) GD-y d (n) 

Mydl ~ " ■ Mydz ~ l " I 18 ) 



GM ■ Ay(n) _ GM- Ay(n) 



/v= r ^=i- 7 v ' (i9) 



4.2.2 Control rule base 

The control rules for the normal tuning operation are as follows: 

Rf. If GE e(n) is e v and GR r(n) is ri p Then Aupi(n) is Oi. 
R2: If GE e(n) is e p and GR r(n) is rj„ Then Aupi(n) is o z . 
R3: If GE e(n) is e„ and GR r(n) is rip Then Aupi(n) is o 2 . 
R4. If GE e(n) is e n and GR r(n) is ri„ Then Aupi(n) is 0;. 
An extra set of four control rules for reducing the output chattering are: 
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R5: If GE e(n) is e v and GR r(n) is ri v Then Aupi(n) is 01. 

Rg: If GE e(n) is e p and GR r(n) is Tin Then Aupi(n) is o z . 

Rf. If GE e(n) is e„ and GR r(n) is f2 P Then Aupi(n) is 0;. 

Rg: If GE e(n) is e„ and GR r(n) is f2„ Then Aupi(ri) is o z . 

The D controller considered here has only four control rules since it only takes the absolute 
value of the error and the rate of output as its inputs. 

Rg. If GD yd(n) is yn and GM Ay(n) is Ayi Then Auo(n) is o z . 

Rio: If GD yd(n) is y&\ and GM Ay(n) is Ay z Then Auo(n) is o 2 . 

Rn: If GD ydji) is i/rfz and GM Ay(n) is Ay/ Then Auo(n) is 0/. 

R22: If GD yd(n) is j/rf 2 and GM Ay(n) is 4y z Then Auo(n) is 0;. 

In the above rules, Atlpi(n) and Auo(n) stands for crisp incremental output of the fuzzy PI 
controller and the fuzzy D controller respectively. 

4.2.3 Defuzzificaction 

Defuzzification of the output for fuzzy PI and fuzzy D controller is carried out by using 
center of gravity algorithm and are expressed as: 

AjUm= ^^ ( 20 ) 

H-S(ju RR ) + 0-S(/Xrr ) 

AM 2 pi= y ^Mil ICMlI (21) 

S(Mr 5 r 7 ) + s (Mr 6 r, ) 

H-S(jt R r ) + 0-S(m rr ) 

A Md = *"*" R9R, ° (22) 

S({Jr u r 12 ) + S(Mr 9 rJ 

where the membership of output fuzzy sets for control rules R2R4, R2R3, R5R7, ReRs, R9R10, 
and RuRu are obtained from Lukasewicz fuzzy logic, or, i.e. /j r R = min(fi R + {i R ,1) . The 
function S(ji) is computed using Mamdani reference. 

S{n) = n(2-n)H (23) 

The actual output of the controller which determines the tuning step-size for the variable 
capacitor is given by: 

GU-Am = GU-(Am pi -AMd) ( 24 ) 

where GU is a scaling factor for the crisp incremental output of the fuzzy PID controller. 
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5. Simulation results 

To illustrate the effectiveness of the proposed fuzzy based DTC algorithm, a power pickup 
model has been created in MATLAB Simulink and PLECS. 
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Fig. 12. Simulink model of LCL based power pickup with DTC. 

The secondary power pickup model with DTC is shown in Fig. 12. Operating conditions of 
the power pickup can generally be categorized into four different cases such as: Under- 
Tuned with Low Start-up Voltage (UT-LSV), Under-Tuned with High Start-up Voltage (UT- 
HSV), Over-Tuned with Low Start-up Voltage (OT-LSV), and Over-Tuned with High Start- 
up Voltage (OT-HSV). However, their results are similar to each other during the control 
process and therefore only two of them are presented here. 

The simulation result of Vout, and Ls2, are shown in Figure 13(a) and (b) respectively when 
the power pickup is operating under UT-LSV. The simulation was started from the circuit 
start-up with a predetermined delay of 0.05s (for separating the initialization and the actual 
control process, easing the observation) until it reaches the desired output voltage level (5V). 
As the error gets reduced, the step change in the tuning inductance also decreases to remove 
the output chattering effect. 

Figure 14 shows the simulation results of the controlled power pickup operating under OT- 
HSV. As can be seen from the results, both UT-LSV and OT-HSV give similar outcome for 
providing a constant voltage at the output. 

From the results of simulation studies of the controlled power pickup under different 
operating conditions, it was observed that the proposed controller is capable of controlling 
the output voltage to the desired value with a response time of 0.1-0. 25s. However, the 
sampling frequency of the controller has to be selected carefully to achieve a more efficient 
output voltage regulation. 
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Fig. 13. Waveform of: a) output voltage of power pickup and b) tuning inductance, with 
Fuzzy based DTC algorithm controlled power pickup operating under UT-LSV. 
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Fig. 14. Waveform of: a) output voltage of power pickup and b) tuning inductance, with 
Fuzzy based DTC algorithm controlled power pickup operating under OT-HSV. 
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6. Conclusions 

A fuzzy based controller tuning step-size adjuster has been integrated with directional 
tuning controller to automatically determine the tuning step-size and to effectively regulate 
the output voltage of the power pickup for inductive power transfer system. The integrated 
controller has solved the directional tracking problem of the traditional PI dynamic 
tuning/ detuning controller and hence achieved full-range power flow control of the 
secondary power pickup. The simulation performed by MATLAB Simulink and PLECS 
have demonstrated the effectiveness of the controller under different testing conditions and 
it has been shown that a desired constant output voltage can be maintained using the 
proposed controller without chattering effect. Within certain allowable tolerance of the 
pickup circuit parameters, the controller can automatically find the correct tuning directions. 
This helps to ease the circuit component selection in design and eliminates the tedious fine- 
tuning process in practical implementation. 

7. Future research 

As the fuzzy based directional tuning control algorithm is developed in discrete-time 
domain, sampling frequency becomes a very important factor which often affects the 
performance of the controller. Although the power pickup system will never go unstable 
since the output voltage is confined by the tuned-point, the true control result of each 
control action and the response time of the controller are still significantly affected by the 
sampling frequency. Two different aspects e.g. the magnitude of voltage variation after each 
control action and the time constant of the DC filter of the power pickup have been 
preliminarily investigated. However, a clear relationship between these two variables has 
not yet been found and therefore needs to be further explored. 
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1. Introduction 

The growing markets of electronic components in automotive electronics, LCD/ LED drivers 
and TV sets lead to an extensive demand of high-voltage integrated circuits (HVICs), which 
are normally built by HV-MOSFETs. These HV-MOSFET devices generally occupy large die 
areas and operate at low speed due to large parasitic capacitance and small trans- 
conductance (gm)- There are two types of HV-MOSFET devices, namely, thick-gate and thin- 
gate oxide devices. Thick-gate oxide devices can sustain a high gate-to-source voltage, Vgs, 
but suffer from a reduced g„„ poor threshold voltage Vj control in production and higher 
cost due to the need of extra processing steps. Thin-gate devices have a larger g,„, smaller 
parasitic capacitance, less processing steps and a lower cost. These properties make the thin- 
gate HV-MOSFETs attractive, though they face severe limitation on Vgs swing. There are 
two main concerns when thin-gate HV-MOSFETs are used. The first is how to achieve high 
current driving capability to drive capacitive loads in high-voltage (HV) application, 
whereas the second is how to protect the thin-gate oxide from HV stress breakdown. For 
current-driving capability, Bales (Bales, 1997) proposed a class-AB amplifier using bipolar 
technology which consumes a high quiescent current and is expensive due to a large die 
area and complicated masking. Lu & Lee (Lu & Lee, 2002) proposed a CMOS class-AB 
amplifier which can only drive around 6mA and does not meet the driver requirements of 
large and fast current responses (Hu & Jovanovic, 2008). Mentze et al. (Mentze et al., 2006) 
proposed a HV driver using pure low- voltage (LV) devices but this architecture requires an 
expensive silicon-on-insulator (SOI) process to sustain substrate breakdown in HV 
application. Tzeng & Chen (Tzeng & Chen, 2009) proposed a driver that consumes a large 
die area with all transistors inside the circuit being HV transistors. On the other hand, 
transistor reliability becomes a serious issue in HV thin-gate oxide transistor circuits. Chebli 
et al. (Chebli et al., 2007) proposed the floating gate protection technique. The voltage range 
under protection will change according to the ratio of capacitors and the HV supply, V D dh- 
This technique, however, cannot limit the voltage across the nodes of gate and source well 
when the variation of the supply voltage is large. Riccardo et al. (Riccardo et al., 2001) 
proposed a method which requires an extra Zener diode to protect the thin-gate oxide 
transistors, so a special process and higher cost are incurred. Declercq et al. (Declercq et al., 
1993) suggested a HV-MOSFET op-amp driver with a clamping circuit to protect the thin- 
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gate oxide, but it consumes a significant amount of die area as all devices are HV-MOSFETs. 
To overcome these drawbacks, the main aims of the proposed driver architecture are: 

a. to minimize the number of HV devices so as to save die area in HV application. 

b. to develop a HV driver with fast transient responses. 

c. to develop reliable thin-gate protection circuitry in HV application, so as to enjoy cost 
saving from reduced processing steps and take advantages of better Vj process control 
and high current gain g m comparing to the thick-gate HV-MOSFET counterparts. 

As a result, a HV high-speed regulated driver is developed using mostly LV-MOSFETs with 
the minimum number of thin-gate HV-MOSFETs. 

In this chapter, we present a high-speed CMOS driver that operates with a HV 7V-to-30V 
supply delivering an output drive up to 190A/|is at a regulated 4.8V output voltage. It is 
particularly suitable for HV applications such as LCD/LED/ AC-DC drivers loaded with 
power (MOS)FETs. The circuit consists of only 5V LV devices and two thin-gate HV 
asymmetrical MOS transistors (HV-MOSFETs) fully compatible with standard CMOS 
technology. The design features a small-area cost-effective solution, measuring only 
650(imx200|im in a 0.5jim standard 5V/40V (V GS /V DS ) CMOS process. The approach of the 
regulated output driver can adjust itself to the desired Vgs, helping to fully utilize the effect 
of Vgs on minimizing the on-resistance, Rds-on, of the power FET. Novel thin-gate 
protection circuits, based on source-follower (SF) configurations, have been deployed to 
limit the Vgs swing to within 5V for the HV-MOSFETs. A dual-loop architecture provides an 
extremely fast slew rate and transient response under a low quiescent current of 90pA in its 
static state and 860pA during switching. A dead-time circuit is included to eliminate the 
power loss incurred by shoot-through current, saving 75mW under a 30V HV supply. 
Moreover, stability analysis and compensation techniques are described in details to ensure 
stable operation of the driver in both loaded and un-loaded conditions. Lab measurements 
are in good agreement with simulations. A comparison with existing works then 
demonstrates the efficacy and superiority of the proposed design. 

In this chapter, Section 2 introduces the use of LV devices to build HV high-speed regulated 
driver, together with stability analyses for both cases when the power FET load is ON or 
OFF. Section 2 also discusses the power saving techniques in driving HV-MOSFETs. In 
Section 3, simulation and lab measurement results are shown which confirm the merits of 
the proposed design. Finally, the conclusion is drawn in Section 4. 

2. Principles of operation 

2.1 Circuit structure and basic operation 

Fig. 1(a) shows the high-level block diagram of the proposed driver. It consists of a LV error 
amplifier, a HV thin-gate protection circuit, a feedback resistor network with pole-zero 
cancellation and a fast transient regulated driver with dead-time control. A HV nMOS, 
hvnOl, is connected to the node V re g in a SF configuration. The driver requires a LV supply, 
Vddl, as well as a HV supply, Vqdh- We first develop an internal regulator, which gives a 
4.8V DC voltage, V reg , through hvnOl. The drain of hvnOl is connected to Vddh, which is 30V 
in our design. The V reg acts as a supply voltage to a chain of inverter buffers, which in turn 
drive the output load at the node V out . The switching activities are started from V,„ all the 
way to Vout- The output load here is the gate of a 1A on-chip thin-gate power (MOS)FET. The 
equivalent gate capacitance is around 270pF. The driver provides a 4.8V output and 
therefore protects the thin gate of the loading power FET by limiting its Vgs right below 5V. 
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The node V ou i can also be connected externally to drive external power FETs. The approach 
of the regulated output driver can always adjust itself to the desired Vgs, helping to fully 
utilize the effect of Vgs on the on-resistance, Rds-on, of the power FET. In this connection, 
and with reference to (1) and (2) (Gray et al., 1990), the on-resistance and die area of the on- 
chip power FET can be minimized: 



M„C 



W 



(V GS -V T ) 2 



(1) 



M„C 



(2) 



(V GS -V T ) 



Equation (1) describes the behavior of a (HV) nMOS in saturation region, while (2) 
approximates the turn-on resistance of a (HV) nMOS in the linear region. Ids is the current 
flowing from the drain to source of a MOSFET. Vgs is the gate-to-source voltage. Vj is the 
threshold voltage to turn on the MOSFET. Also, ji„ is the mobility of electrons and C« is the 
gate-oxide capacitance per unit area, whereas W and L are the width and length of the 
transistor, respectively. 




Voltage Regulation 
Loop 



Fast Souree-Follower 
Loop 




Fig. 1. (a) Architecture of the proposed driver; (b) Dual-loop structure in the driver 
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Fig. 2. Detail schematic of the proposed driver 
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Loop gain(dB) 


45 


45 


45 


45 


Phase Margin (deg) 


83.7 


109 


57.5 


77.1 


Unity-Gain 
Frequency (UGF) 
(kHz) 


27.1 


33.8 


24.5 


28.1 


Gain Margin (dB) 


35.7 


31 


18.2 


21.9 


Gain Margin 
Frequency (kHz) 


623.6 


918.3 


94.6 


184.6 


Source Follower 
Unity-Gain 
Frequency (MHz) 


>100 (Gray et 
al., 1990) 


>100 


>100 


>100 



Table I. Summary of frequency responses of the driver with output = high and output = low 



2.2 Regulated driver with fast transient response 
2.2.1 Fast dual-loop operation 

As shown in Figs. 1 & 2, there are two loops in the driver, namely, the voltage-regulation 
(VR) loop and the source-follower (SF) loop to achieve fast transient responses. Firstly, for 
the VR loop, the error amplifier senses the V Kg through the resistor network and amplifies 
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the error signal between the scaled V Kg and the reference voltage V re f. The error signal is 
then shifted up to a higher voltage through the thin-gate protection circuit and regulates 
hvnOl to correct the error, thereby generating a steady and accurate Vreg- Secondly, for the 
SF loop, the SF configuration of hvnOl itself is a fast feedback loop. Referring to (1) and Fig. 
1, the feedback mechanism is obvious: When the node V reg goes down due to load current 
change, the gate-to-source voltage of hvnOl, Vcs-hvnOi, increases and sources a larger output 
current to charge up the node V reg again. The main function of the VR loop is to provide a 
regulated voltage of around 4.8V in the steady state, while the fast SF loop provides an 
immediate response when there is a sudden load change. 

2.2.2 Loop gain analysis with the power FET being ON/OFF 

We first analyze the SF loop and later the VR loop. For the SF loop, it is well known for its 
fast response with its unity-gain frequency (UGF) in the 100MHz to 1GHz range (Gray et al., 
1990). Its pole effect is generally beyond the UGF of the VR loop and therefore negligible. 
For the VR loop, there are two scenarios in the stability analysis: the power FET ON and the 
power FET OFF. When it is ON, Cl = Cl-on = CpomerFET K 270pF, and when it is OFF, Cl= 
Cl-off (b OpF. Here C p0W erFET is the equivalent gate capacitance of the power FET. The AC 
simulation with and without the power FET is shown in Table I and Fig. 3. The phase 
margin of the VR loop is larger when the power FET is OFF. This can be explained by the 
following loop gain analysis: 



T(s) = A(s) 



1+- 



V 



J hvp01 



1 + 



PhvpOl 



1 + - 



V s ^ 

1+ — 

z. 



1+ 



) 
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1+ — 



(3) 



where Zfc, p oi, Zfo B oi/ P%i0i and pfonoi are the zeros and poles from hvpOl and hvnOl, 
respectively. A(s) is the transfer function of the error amplifier. R' = R% + R2. The zeros and 
poles are defined as 
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•Pf= 
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where g m ,fe, p oi, gm,fa>n0i, C gs ,hvp0i, Cg S ,fon0l are the trans-conductances and gate capacitances of 
hvpOland hvnOl, respectively, whereas r ,iuas is the output impedance from the current 
source luas- We assume the gains of the SF configurations formed by hvpOl and hvnOl are 
unity. Also, phvnoi is the pole contributed by hvnOl where pfemOl = PfemOl-ON and pfe,„oi = 
PtonOi-OFF when the power FET is ON and OFF, respectively. Typically, the zeros are located 
at higher frequencies than poles in the SF configuration except for p^nOl. As Cl-on k 270pF 



» Cl-off * 0, the pole phvnoi- 



Phvnoi-OFF- A double-pole effect before the UGF happens 



and may lead to instability when Cl - Cl-on = C vower FEF when the power FET is ON. 

To avoid instability, we designed a feedback-resistive network which creates a medium 

frequency zero for warranting the stability. Referring to Ri, R2 and Ci in Fig. 2, 
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*g,n04 



s 
Pry 



(5) 



where Vreg and V s ,„04 are the voltages at the nodes at V r eg and gate of n04, respectively. The 
frequency of the zero, Zf, is lower than the pole frequency, p/> and this zero can be used to 
cancel the pole effect of pfcmOl-ON. 
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Fig. 3. (a) Simulated loop gain of the proposed driver with power FET ON; (b) Simulated 
loop gain of the proposed driver with power FET OFF 
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In order to have Zf« pf, R2 should be much smaller than R%. From Fig. 3, the phase margin 
is very good even when the power FET is ON. However, if Ci is not inserted, the double- 
pole effect will be significant. Results in Table I clearly show the pole-zero cancellation. 
When the power FET is OFF, the phase margins are around 109° and 83° with and without 
the zero Z/, respectively. When the power FET is ON, the phase margins are around 77° and 
57° with and without the zero z/, respectively. The differences in phase margin, with and 
without the zero z/, are around 20° to 25° in both cases. With the pole-zero cancellation 
technique, the unity gain frequencies are also larger in both the power FET ON/ OFF cases. 
These differences are significant in stability and transient analyses. The larger the phase 
margin, the less the ringing is. As the phase margin is larger, the settling time is faster also 
(Gray et al., 1990). The lab measurement results in Section 3 will demonstrate the steady and 
fast transient responses of the driver, thereby verifying the usefulness of the pole-zero 
cancellation technique in this type of regulated gate driver. 

2.3 Power-saving: LV devices in HV application 

HV devices differ from the normal LV ones in several ways. The size of a HV transistor is 
much larger than that of a LV transistor (Murari et al., 1995). There are several problems in 
using HV devices as inverter chains to drive power FETs, namely, 

a. Large parasitic capacitance: The larger size HV transistors result in larger parasitic 
capacitance. The dynamic power, which is the product of the capacitance (C) and the 
square of the voltage (V), CV 2 , is directly proportional to the parasitic capacitance. As a 
result, the total power consumption of a HV inverter is much higher than that of the LV 
one. The number of stages also trades off with the rise and fall times of the driver 
output and subsequently the delay of the driver output signal. 

b. Severe Vgs limitation for thin-gate devices: Though LV devices are preferred, there is a 
gate-to-source Vgs swing limitation when LV devices are used in HV application. If the 
gate-to-source voltages of the pMOS and nMOS inside the inverters are above 5V, we 
must use thick-gate devices. The gate capacitance of the thick-gate devices are large and 
therefore will slow down the rise and fall times and the propagation delay. It also 
increases the cost as an extra processing step for thick-gate is needed. 

c. Significant power loss in shoot-through current: During switching of the inverter chain, 
there is a shoot-through current flowing from the V reg node to ground. Such dynamic 
current causes the V reg voltage to drop (Heydari & Pedram, 2003). Since the operating 
voltage is 30V, the power of the shoot-through current still contributes much to the 
power loss. 

d. Large die area: using HV-MOSFETs will occupy huge die areas and hence increase the 
wafer cost. 

In the following, we propose solutions to solve the above problems by employing LV 
devices in HV driver application. 

2.3.1 Power saving & thin-gate protection in the regulated driver 

In the proposed design, we use all LV transistors (5V) in HV (30V) applications except two 
HV thin-gate transistors. This approach results in low dynamic power consumption and a 
small die area. We use LV devices to construct the inverter chain. The supply voltage of the 
inverters is given by the internal regulator at the V re g node which maintains a 4.8V supply. 
This node is connected to the source of hvnOl whose drain is connected to Vddh- This 
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connection ensures that the internal regulator can give sufficient current to the inverter 
chain to drive the load. The maximum current is limited by the size of hvnOl, or the internal 
supply voltage Vreg will go down if the loading current is too large. This regulated driver 
approach helps protect the thin-gate oxide of the power FET from damage by HV stresses. 




Fig. 4. Dead-time circuit 
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Fig. 5. Shoot- through current 



2.3.2 Power saving via dead-time circuit 

There are several ways to reduce the shoot-through current. In the proposed circuit, a dead- 
time control circuit is added for this purpose. This dead-time circuit prevents the flow of 
shoot-through current by a break-bef ore-make logic. Fig. 4 shows the dead-time circuit and 
Fig. 5 shows the current going from the Vreg node to ground when the driver is charging up 
the load. Driver with the dead-time circuit only peaks up to 0.77mA, which is one-fifth of 
the driver without dead-time circuit. The 0.77mA current is mainly due to the switching 
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activity of the dead-time logic. Since the dead-time circuit eliminates the shoot-through 
current of the final-stage driver, the driver possesses a higher slew rate and higher efficiency 
to drive the output capacitive load. The original driver has a 97ns rise time and 39.41V/ (is 
slew rate, while the one with the dead-time circuit is 73ns and 52.06V/ (is, respectively. The 
slew rate is improved by 32% owing to a larger portion of current charging up the output 
load Cl instead of being shunted to ground as shoot-through current. 



2.3.3 Thin-gate protection circuit 

Using SF configuration as thin-gate protection circuitry for HV-MOSFET is one of the 
innovations in this design. Referring to Fig. 2, the gate voltage of hvnOl, Vc-hmm., is limited 
by the SF configuration, where the gate voltage of hvnOl, Vg-Iwhoi m (Vddl ~ Vos-pOi + 
Vgs-Iw P oi) k (4.5 - 0.2 + 1.2) = 5.5V. The gate-to-source voltage of hvnOl, V GS -i W n0i K V G - fc ,„ i ~ 
Vreg = 5.5 - 4.5 M IV. The gate-to-source voltage of hvpOl is limited by Vos-n02 + Vt-kap " 0.2 + 
1.2 « 1.4V. The gate-to-source voltages of both hvpOl and hvnOl are therefore well limited 
below 5V. In other words, we utilize the SF characteristic where the source voltage tracks 
the gate voltage and subsequently protects the thin-gate oxide. 



3. Simulations and lab. measurements 

3.1 High current drive 

Fig. 6 shows Vddh vs Vreg with Vddl fixed at 5V. Measurement result shows that V reg 
becomes regulated when Vddh exceeds 7V. The line regulation of V reg from Vddh at 7V to 
30V is 0.113mV/V. Fig. 7 shows the transient simulations of the driver. The corresponding 
lab measurements are shown in Figs. 8-11, and the die photo is shown in Fig. 12. Obviously, 
the measurement agrees with the simulation results. When the power FET turns ON, the 
transient output current rises from to 100mA in 525ps, i.e., about 190A/p,s. When the 
power FET turns OFF, the output sinking current is about 120mA. With the large current 
driving capability, the output can charge a 270pF load within 100ns. That is, the driver is 
able to operate up to 10MHz even under heavy loading. 
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Fig. 6. Vregvs Vddh with Vddl=5V 
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Fig. 7. Simulated transient responses with Vddh = 30V: (a) overall waveforms (b) transient 
current I mi and V reg when charging up output capacitor (gate capacitor of power FET) 
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Fig. 12. Die photo of the proposed driver 



3.2 Table of comparison 

The left part of Table II shows the performance comparison between the proposed driver 
and other HV circuits with thin-gate protection. Our work features small die area (650p,m x 
200|im), high slew rate (52V/ (is), fast transient current (190A/|is), fast rise (73.8ns) and fall 
time (17.5ns). The right part of Table II shows the comparison of our work and other drivers, 
including high-speed LV ones. Our work still features the smallest die area, highest slew 
rate, and fastest rise and fall times among all CMOS implementations. The bipolar 
implementation only shows fast rise and fall times under unloaded measurement, and its 
bipolar nature makes it unattractive for implementation due to high cost. 
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This Work 


Floating Gate 
Protection 
Technique 

(Chebli et al., 

2007) 


Low to 

High 

Voltage 

Digital 

Interface 

(Declercq et 

al., 1993) 


High- 

Voltage 

CMOS 

Op Amp 

(Declercq et 

al., 1993) 


Class AB 
Output 
Stage Op- 
Amp 
(Bales, 
1997) 


Class AB 

buffer amp 

with Slew 

Rate 

Enhancement 

(Lu & Lee, 

2002) 


Regulated 

Gate 

Driver 

(Tzeng & 

Chen, 

2009) 


Process 


0.5um 


0.8um 


2.0um 


2.0um 


Bipolar 


0.6um 


0.5um 


Die Area 


0.13mm 2 


0.9mm 2 


N/A 


N/A 


0.8 mm 2 


N/A 


0.72mm 2 


Dead-time 
circuit 


V 


X 


X 


X 


X 


X 


X 


Load 


270pF 


lOOpF 


30pF 


lOOOpF 


N/A 


680pF 


2400pF 


Slew Rate 


52V/ us 


N/A 


- 


15V/ us 


N/A 


2.41V/ us 


N/A 


Rise time 


73.8ns 


474ns 


80ns 


- 


7ns 


1.6us 


« 670ns 


Fall time 


17.5ns 


445ns 


80ns 


- 


7ns 


lus 


N/A 


Maximum 

output 

current 


100mA 
©charge 

120mA 
©discharge 


N/A 


N/A 


20mA 
©charge 


100mA 
©charge 


N/A 


N/A 


HV supply, 

Vddh 


30V 


60V 


75V 


75V 


N/A 


N/A 


30V 


LV supply, 

Vddl 


5V 


5V 


5V 


N/A 


5V 


N/A 


N/A 


Thin-gate 
oxide 
Protected ? 


Yes 


Yes 


Yes 


Yes 


N/A 


N/A 


N/A 


Power 


4mW 


0.55mW 


N/A 


N/A 


>7.5mW 


lmW 


546mW 



Table II. Performance comparison between this work and similar works 



4. Conclusion 

A 7V-to-30V high-speed CMOS regulated driver for on-chip thin-gate power MOSFET has 
been developed. A small die area is achieved by minimizing the number of HV devices. The 
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HV devices are all thin-gate type and the corresponding Vgs and driver output voltages are 
constrained below 5V by the SF circuit technique. The driver is capable of delivering a fast 
transient current response of up to 190 A/ (is when charging up the output capacitor. The 
maximum charging and discharging currents are 100mA and 120mA, respectively, while 
keeping the quiescent current to below 90pA at static state for 7V to 30V application. A 
dead-time circuit is incorporated to reduce 75mW power loss due to shoot-through current. 
In short, we have developed a thin-gate-protected fast driver using standard HV CMOS 
process for HV applications with a small die area. This topology is applicable at 30V supply 
voltage or higher. The stability analysis for compensating this type of regulated driver is 
also presented to provide useful insights and guidelines for driver IC design. 
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1. Introduction 



Millimeter waves are electromagnetic waves with wavelengths of 1 to 10 mm in vacuum, 
and they were discovered experimentally in the 19th century (Wiltse, 1984). In 1946, the 
most unique feature of millimeter waves, oxygen absorption at 60 GHz, was reported, which 
results in the rapid attenuation of electromagnetic waves in the air (Beringer, 1946). 
Although the oxygen absorption makes long-distance wireless communication difficult, it 
enables us to allocate a wide frequency band, which realizes ultra-high-speed 
communication greater than lGbps (gigabits-per-second). Recently, the well-known feature 
of millimeter-wave communication has attracted attention again because millimeter-wave 
circuits have been realized with advanced CMOS technologies, and the recent 60GHz band 
license-free regulations with license-free bandwidths of 9GHz in Europe and 7GHz in Japan, 
USA, Canada and Korea. In academic conferences and journals, many studies on millimeter- 
wave CMOS circuits were reported in the past few years, and consumer devices are 
expected to be available soon. 

Here, for realizing the consumer application of millimeter waves, the reduction of power 
consumption is the most important issue. It is noted that the power-hungry building blocks 
in a transceiver are the local oscillator (LO) based on the phase-locked loop (PLL), and 
analog- to-digital and digital-to-analog converters (ADC and DAC) as shown in Fig. 1(a) 
(Marcu, 2009). If these blocks can be eliminated partially or completely in a transceiver, 
power consumption will be considerably reduced. From this viewpoint, we have studied 
millimeter-wave pulse communication for high-performance CMOS wireless transceivers as 
shown in Fig. 1(b) and Fig. 1(c). In this study, low-power direct pulse generators, high-speed 
switches and receivers, which are the most important building blocks in millimeter-wave 
pulse communication, are discussed for high-speed wireless communications using the 60 
GHz band. In conclusion, the prospects for millimeter- wave pulse communication will be 
addressed. 

2. 60GHz CMOS pulse transmitter 

In this section, three low-power 60GHz CMOS pulse transmitter circuits are presented. The 
first one is a carrier-less direct pulse generator circuit, (Badalawa, 2007). The second design 
presents an 8Gbps millimeter-wave CMOS switch used for an Amplitude Shift Keying 
(ASK) modulator (Oncu, 2008, b) and the last one presents a design of a low-power lOGbps 



256 



Advances in Solid State Circuits Technologies 



Data 
input 



Transmitter 



1 ■*■ Range ■+ I 



© 



(a) 



Receiver 



® 



Data 
output 



Data 
input 






Transmitter 



lT Ij 

1 •<- Range ->■ I 



s 



(b) 



Receiver 



5" 



Data 
output 



Transmitter 

switch 



© 



mm-wave CW 
source 



Data 
input 



T — T_r 

_| ■<- Range ->• 1 



Receiver 



(c) 



s 



Data 
output 

4 



Fig. 1. Block diagram of wireless communication based on (a) carrier modulation, (b) direct 
pulse generator without oscillator, (c) pulse generator with millimeter-wave oscillator. 

CMOS transmitter for a 60GHz millimeter-wave impulse radio, where a 60GHz millimeter- 
wave continues-wave (CW) source and ASK modulator circuits are embedded on the same 
silicon substrate. 



2.1 60GHz CMOS pulse generator design 

The circuit topology of the proposed pulse generator (PG) is shown in Fig. 2. This circuit has 
a monopulse generator (MPG) cell is composed of two CMOS inverters to contribute the 
delay and two NMOS transistors to produce the pulses by combing edges as shown in Fig. 
3(a). The inverter A is driven by falling edges of baseband data. Just before the falling edge, 
NMOSFET C is "off" and NMOSFET D is "on". When the signal passes through inverter A, 
NMOSFET C is turned "on" and the output node is discharged. Next, when the input signal 
passes through inverter B, NMOSFET C is turned "off" and the output node is charged by a 
pulling-up inductor. At this moment, one pulse is produced according to the propagation 
delay of inverter B. The transmitter can be implemented with a low power consumption 
using this topology, because the circuit is activated only when falling edges of the input 
signal are fed from the baseband data. Since no power is consumed at other times, 
consumed power has a linear relationship with the input data rate. 

To fit the delay time per inverter to 8ps, which is being equal to half the reciprocal of the 
carrier frequency, it is essential to reduce the load capacitance of the transistors that are 
connected to each inverter output node. To obtain a short delay time, the gate widths of 
NMOS and PMOS transistors in the inverter should be increased to obtain a large drain 
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Fig. 2. Circuit topology of a 60GHz CMOS pulse generator. 
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Fig. 3. (a) An edge combiner comprising MOSFETs C and D has to generate a 16ps pulse, (b) 
Centre frequency as a function of NMOS width Wser over inverter width Winv. 

current. Since the load capacitance connected to the inverter output node is varied favorably 
or unfavorably with the inverter delay, the relationship between the size of the inverter and 
the edge combiner NMOSFET is essential to obtain a carrier frequency of 62.5GHz. Figure 
3(b) shows the relationship between centre frequency when the fan-out is varied from 0.01 
to 1. To realize a 60GHz PG using this circuit topology, the fan-out should be set to 0.1. Not 
only the optimization, but also selecting of CMOS process with small threshold voltage is 
one of the key points to implement 60GHz pulse generator as mentioned above. Here, we 
choose the 9metal TSMC CMOS 90nm process, which has the 1/2 times of small threshold 
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voltage of the CMOS process used in 22-29GHz UWB CMOS pulse generator circuit in 
(Fujishima, 2006). 

The power spectrum must fit into a spectrum mask to meet regulations as shown in Fig. 4. 
Here, filtering is employed (Maruhashi, 2005) to satisfy the regulations while increasing the 
power consumption. To solve these problems, an all-digital low-power CMOS pulse 
generator with 14 delay stages, which generates a pulse width of 224ps, is adopted. To 
satisfy the power spectrum regulations without any filters, monopulse amplitudes within a 
single pulse are adjusted to four levels to approximate the ideal Gaussian power spectrum 
by sizing the edge-combiner NMOSFET as shown in Fig. 5. 
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Fig. 4. Pseudo-raised cosine pulse for satisfying specified. 
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Fig. 5. MOSFET sizing for generating pseudo-raised cosine pulse. 

Figure 6 shows a chip micrograph of the CMOS pulse generator with a die area of 
590x380(im 2 , where a 90nm CMOS process with nine metal layers was used. The time- 
domain response of the pulse generator is shown in Fig. 7, where the 62.5GHz operating 
frequency is observed at a supply voltage of 1.15V, and the four-level approximation is 
confirmed. 
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Fig. 7. Measured transient response of 60GHz CMOS pulse generator. 

Figure 8 shows carrier frequencies and output powers as function of supply voltage, and 
also shows power consumptions as function of input data rate. The carrier frequency 
increased with supply voltage with inverse proportional relationship, while output power is 
almost unchanged when supply voltage is higher than 0.7V. The linear dependence of 
power consumption on input data rate is confirmed by the measurement data. Since power 
is only consumed at rising edges of the input signal, a low average power consumption is 
observed at 1.5Gbps compared with those in (Maruhashi, 2005; Nakakita, 1997). The power 
consumption for the proposed pulse generator is 11.5mW at a supplies voltage of 1.15V. 
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Fig. 8. (a) Carrier frequency and output power as a function of supply voltage and (b) power 
dissipation as a function of input data rate. 



260 



Advances in Solid State Circuits Technologies 



In this section millimeter-wave pulse generator was studied. By designing pulse generators 
in digital circuits, a 60GHz millimeter-wave pulse can be generated without using a power- 
hungry LO. As a result, the pulse generator consumes a small amount of power 
proportional to input data rate. However, this architecture strictly depends on the used 
technology to achieve higher RF power. We concluded that shorter channel advanced 
CMOS processes would provide better speed and RF power performance. In the following 
sections, we study the pulse generator architectures consisting of a low-power millimeter- 
wave ASK modulator and a 60GHz oscillator in standard CMOS process which is generally 
used for digital processor design. 

2.2 8Gbps 60GHz CMOS ASK modulator 

A millimeter-wave CMOS impulse radio with ASK modulator, as shown in Fig. 9, is 
promising for low-cost and low-power wireless communication, in which a digital switch 
controls a millimeter-wave CMOS ASK modulator in the transmitter. This architecture will 
have less sensitivity to the used CMOS technology than that of a direct millimeter-wave 
pulse generator. The receiver receives 60GHz pulses and converts them to a digital signal 
(Oncu, 2008, a; Lee, 2009). In this section, we study a design of an 8Gbps CMOS ASK 
modulator for a 60GHz millimeter-wave impulse radio. 



TX 



RX 



This work 



©- 

Osc. 



60GHz 

ASK 

modulator 



(over-Gbps 
digital data) 



CMOS 



J... 110 1 



CMOS 
digital 
circuitry 



ANT 



ANT 



over-Gbps 
60GHz pulses 



60GHz 

pulse 

receiver 



...1101 

mi 



CMOS 
digital 
circuitry 



CMOS 



(over-Gbps 
digital data) 



Fig. 9. Block Diagram of millimeter-wave impulse radio with a 60GHz ASK (Amplitude 
Shift Keying) modulator. 

Figure 10(a) shows a conventional millimeter-wave ASK modulator in CMOS (Chang, 2007). 
It consists of an oscillator and a buffer. Millimeter-wave pulses are obtained by turning the 
biasing on and off. Although this architecture has high isolation when the biasing is turned 
off, the switching speed is limited by the stored energy in the oscillator tank. High-speed 
conventional distributed traveling-wave millimeter-wave ASK modulators in compound 
semiconductors have been reported (Mizutani, 2000; Ohata, 2000; Ohata, 2005; Kosugi, 2003; 
Kosugi, 2004). They were realized using distributed shunt switches between the signal and 
the ground line of a transmission line as shown in Fig. 10(b). In this architecture, when the 
switches are off the input signal is transferred to the output and the ASK modulator is in the 
ON state. On the other hand, when the switches are turned on, no input signal is transferred 
to the output and the ASK modulator is in OFF state. The distributed structure requires a 
large number of switches since the resistances of the switches in the OFF state should be 
small to realize a lossy transmission line. 
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Fig. 10. Architectures of conventional (a) high-isolation and (b) high-speed ASK modulators. 



2.2.1 Millimeter-wave CMOS ASK modulator design 

A possible distributed CMOS modulator is shown in Fig. 11(a). However, low-quality 
parasitic capacitances in the switches, which are located on a silicon substrate, are expected 
to degrade the transmission line characteristics. In this study, a reduced-switch architecture 
is used for a high-speed millimeter-wave CMOS ASK modulator as shown in Fig. 11(b). 
Note that the isolation characteristics become degraded upon reducing the number of 
switches since each switch has a leakage to the output. To achieve high isolation with a 
reduced number of switches, the transmission line length between switches is adjusted. 
When the millimeter-wave signal travels from the source to the load, the switches do not 
only dissipate the incident signal, but they also reflect and leak it as shown in Fig. 12. Note 
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Fig. 11. Architectures of (a) distributive and (b) reduced-switch ASK modulators in CMOS 
process. 
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Fig. 12. Illustration of transmitted, reflected, dissipated and leaked signals of a switch in the 
(a) ON and (b) OFF states of the modulator when the millimeter-wave signal travels from 
source to the load. 
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Fig. 13. (a) Impedance transformation along the modulator and (b) calculated reflected, 
dissipated and leaked powers as a function of the transmission line distance between switches. 
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that, in a transmission line, impedance transformation between the two terminals occurs as 
shown in Fig. 13(a). In Fig. 13(b), the calculated leaked, reflected and dissipated powers are 
shown as a function of the distance between switches. Since the dissipated power in the 
switches is insensitive to the transmission line length, reflection should be maximized to 
minimize the leakage. To obtain maximum reflected power and minimum leaked power, the 
switches are separated by a quarter-wavelength distance. In this case, the isolation is 
maximized with a lower number of switches. 

A 60GHz CMOS ASK modulator is designed with three NMOSFET switches and two 
quarter-wavelength transmission lines as shown in Fig. 14. When the digital input is OV, the 
NMOSFET switches are turned off. Since the parasitic capacitance of each switch in the OFF 
state is negligible, the input impedance of each transmission line is equal to the load 
impedance and the input power is transferred to the output. When the digital input is IV, 
the switches are turned on. The transmission line with a quarter wavelength transforms the 
low impedance of the switch to a high impedance and reflection is maximized. In this case, 
the leaked power to the output is minimized and high isolation is achieved. 
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Fig. 14. Circuit schematic of the CMOS ASK modulator for 60GHz wireless communication. 

Millimeter-wave NMOSFET models are established by extracting the parasitic components 
based on on-wafer measurements (Doan, 2005). The slow-wave transmission line (SWTL) 
(Cheung, 2003) shown in Fig. 15 is used for implementing the quarter-wavelength 
transmission lines and the networks between the circuit and the pads to reduce the size of 
the modulator. In the SWTL, a slotted ground shield under the signal line is laid orthogonal 
to the direction of the signal current flow. This structure results in the propagating waves 
having lower phase velocity; thus, the corresponding wavelength at a given frequency is 
reduced. A quarter wavelength is obtained using a 450-|im-long SWTL. Note that the 
quarter wavelength would be 850um if a microstrip line (MSL) was used. 
200Q gate resistors are inserted to ensure operation with sufficient high-speed. Transient 
internal waveforms are simulated as shown in Fig. 16. A 200ps pulse is applied from the 
data port to analyze the response of the circuit. The total time of the rising and falling gate 
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Fig. 16. Transient simulation; (a) 200ps applied data pulse, and responses of (b) the gate 
voltage of the NMOSFET switch, and (c) input and (d) output signals. 

voltages is estimated as 125ps, which corresponds to the maximum data rate of 8Gbps. The 
60GHz millimeter-wave ASK modulator is fabricated by a 6-metal 1-poly 90nm CMOS 
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process. The cutoff frequency ft and the maximum operation frequency of the nMOSFET are 
130GHz and 150GHz, respectively. Figure 17 shows a micrograph of the fabricated ASK 
modulator. The size of the chip is 0.8mm x 0.48mm including the pads. The core size is 
0.61mm x 0.3mm. 




Fig. 17. Micrograph of the fabricated chip. 

2.2.2 Experimental result and discussion 

On-wafer two-port measurements were performed up to 110-GHz with Anritsu ME7808 
network analyzer with transmission reflection modules for the ON and OFF states by 
applying 0V and IV DC voltages to the gate terminal, respectively. The measured and 
simulated insertion losses of the modulator for the two states are shown in Fig. 18(a) for 
comparison. The insertion losses in the ON and OFF states are 6.6dB and 33.2dB, 
respectively, at 60GHz. Isolation is defined as the insertion loss difference between the ON 
and OFF states, which is 26.6dB. The isolation is nearly flat from 20 to 80GHz, although the 
maximum isolation is measured at 60GHz. As a result, shorter transmission lines may be 
adopted to reduce the insertion loss caused by the SWTL in the ON state of the modulator. 
The simulated isolation is shown at frequencies up to 350GHz in Fig. 18(b) to demonstrate 
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Fig. 18. Measured and simulated (a) insertion loss (S21) of the ASK modulator for ON and 
OFF states and (b) isolation of the ASK. 
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the frequency behaviour of the modulator. The minimum isolation appears at 60GHz when 
the electrical length of the transmission lines is X/i, where X is the wavelength. Local 
maxima in the OFF-state insertion loss occur at 180GHz and 300GHz, which correspond to 
3X/4 and 5 X/ 4, respectively. 

The time-domain response is measured using a 70GHz sampling oscilloscope, a 60GHz 
millimeter-wave source module and a pattern generator. No external filters are applied in 
the measurement. A 60GHz continuous wave is applied to the RF input and the modulator 
is controlled by the pattern generator. The rising and falling times of the applied baseband 
signal are 6ps and 8ps, respectively. The output response for the maximum data rate is 
shown in Fig. 19(a). In Fig. 19(b), the output response is shown for a 125ps single-baseband 
pulse by reducing the scale to 20ps. 
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Fig. 19. Measured output response of the modulator for (a) an 8Gbps data train and (b) a 
single 125ps data pulse. 

The maximum data rates as a function of the isolation of the millimeter-wave ASK 
modulators are shown in Fig. 20. It can be seen that the isolation and the maximum data rate 
have a tradeoff relationship. The product of the maximum data rate and the isolation of this 
modulator is 170GHz, which is the highest value among multi-Gbps ASK modulators. 



2.3 12.1 mW 10Gbps pulse transmitter for 60GHz wireless communication 

In this section, we present a design of a low-power lOGbps CMOS transmitter (TX) for a 
60GHz millimeter-wave impulse radio, where a 60GHz millimeter-wave CW source and 
ASK modulator circuits are embedded on the same silicon substrate as shown in Fig. 21. An 
8Gb/ s CMOS ASK modulator for 60GHz wireless communication is studied in Section 2.2. 
This single-pole-single-throw (SPST) reduced NMOSFET switch architecture is capable of 
high-speed operation without DC power dissipation. Its isolation was maximized by a 
quarter-wave length transmission line which results in a long transmission lines, therefore 
the insertion loss becomes high. Figure 22(a) shows TX configuration which consists of an 
off-chip 60GHz millimeter-wave CW source and an on-chip CMOS modulator. Off-chip 
millimeter-wave source module will increase the size, the total power consumption and the 
cost of the TX system. The oscillator should be embedded in the CMOS chip for a practical 
application. The millimeter-wave CMOS oscillators are commonly designed in differential 
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Fig. 21. Block diagram of a Giga-bit millimeter- wave wireless pulse communication in 
CMOS. 

ended (Huang, 2006). In this design a differential ended CMOS oscillator was designed for a 
60GHz CW source. To utilize the differential-ended output signal, a double-pole-single- 
throw (DPST) switch was proposed for modulator as shown in Fig. 22(b). 



2.3.1 60GHz pulse transmitter design 

2.3.1.1 60GHz CMOS CW Signal Source Design 

Figure 23 shows the schematic of the on-chip 60GHz CW source circuit which consist of two 
sub-blocks, a 60GHz oscillator and a buffer. The oscillator generates a 60GHz CW signal and 
the buffer drives the ASK modulator. The 60GHz oscillator contains an on-chip transmission 
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60GHz CW source and (b) a proposed differential-ended pulse transmitter with on-chip 
60GHz CW source. 
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Fig. 23. Circuit schematic of a 60GHz millimeter-wave continues-wave (CW) source. 

line resonating tank with a MOS capacitor and two cross-coupled MOSFETs which realize a 
negative conductance in parallel with the tank. The size of the devices was chosen by 
considering the parasitic and the process variations to keep the resonation at the 60GHz 
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millimeter-wave band. The active device and the MOS capacitor models were obtained from 
the foundry. The transmission lines were characterized by a 3D full-wave electromagnetic 
field simulation using high-frequency structure simulator (HFSS). 

The bias voltage does not only affect the negative conductance but also power consumption. 
High supply voltage results in a high-power dissipation. Even though a maximum 1.2V 
supply voltage is allowed in this CMOS process, it is simulated in spectre RF that the 
oscillation starts when the supply voltage is approximately 0.9V. 0.1V was decided as a 
margin and the supply voltage was set to be IV for low-power operation. 

2.3.1.2 Millimeter-wave Differential Ended CMOS ASK Modulator Design 

Figure 24 shows the 60Hz differential ended CMOS ASK modulator. It is designed by a 
DPST switch consisting of a parallel connected two SPST switches. The inputs are connected 
to the complementary outputs of the on-chip 60GHz signal source. The gates of the switches 
are controlled by binary data. Each SPST switch is designed with two NMOSFET switches 
and a transmission line, TL1 as shown in Fig. 24. When the digital input is 0V, the 
NMOSFET switches are turned off. Since the parasitic capacitance of each switch in the OFF 
state is negligible, the input impedance of each transmission line is equal to the load 
impedance and the input power is transferred to the output as shown in Section 2.2 Fig. 
12(a). When the digital input is IV, the switches are turned on. The transmission line 
transforms the low impedance of the switch to high impedance and reflection is increased. 
In this case, the leaked power to the output is reduced and isolation is improved as shown 
in Section 2.2 Fig. 12(b). 
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Fig. 24. Circuit schematic of the differential-ended ASK modulator for 60GHz millimeter- 
wave pulse transmitter. 

The isolation is theoretically maximized when the switches are separated by a quarter- 
wavelength transmission line however long transmission lines result higher insertion loss. 
The isolation was maximized with two quarter-wavelength transmission lines whose total 
length is 900um which results in 6.6dB insertion loss in Section 2.2. The isolation is nearly 
flat from 20 to 80GHz, although the maximum isolation is measured at 60GHz. As a result, 
shorter transmission lines may be adopted to reduce the insertion loss caused by the on-chip 
transmission line in the ON state of the modulator. In this CMOS technology, the length of a 
quarter-wavelength transmission line is 600um. We designed the switch with a 300um long 
transmission line where the isolation will slightly degrade but the insertion loss will 
improve. 
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2.3.2 60GHz pulse transmitter measurement and discussions 

The proposed pulse transmitter, a 60GHz millimeter-wave source and an ASK modulator 
test circuits were fabricated by an 8-metal-l-poly 90nm CMOS process with a rewiring layer 
fabricated by a wafer-level chip-scale package (W-CSP). Figure 25 shows the micrographs of 
the pulse transmitter chip. In this design, the pitch of radio frequency and the biasing pads 
are designed 150um. 



OUT+ 




Fig. 25. Micrograph of the fabricated 60GHz pulse transmitter chip. 

2.3.2.1 60GHz CW signal source 

The spectrum of the 60GHz CW signal source was measured using an Agilent E4407B 
spectrum analyzer and an Agilent 11970V 50-75GHz harmonic mixer. A 60GHz continues- 
wave signal was measured at the output of the circuit whose spectrum is shown in Fig. 26. 
In this measurement setup, the total power loss of the probe, cables, connecters and 
harmonic mixer is approximately 42dB. It was observed that the fabricated chip starts to 
oscillate when the bias voltage is larger than 0.7V. The measured operating frequency as a 
function of supply voltage is plotted in Fig. 27(a). Figure 27(b) shows the power dissipation 
and millimeter-wave RF power as a function of the supply voltage from 0.7V to 1.4V. As the 
supply voltage increases, the power dissipation rapidly increases. However, the millimeter- 
wave output power saturates when the supply voltage reaches near to IV. The power 




Fig. 26. Measured output spectrum of the 60GHz CW source. 
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Fig. 27. Measured (a) operating frequency of the oscillator and (b) power dissipation and 
output millimeter-wave power of the oscillator as a function of supply voltage. 

dissipation was measured to be a 19.2mW at a maximum allowed supply voltage of 1.2V. 
We reduced to the supply voltage to IV for low-power operation where the millimeter-wave 
output power was measured to be -20.7dBm and power dissipation of 12.1mW. In this 
study, we found out that our layout versus schematic verification software had not been 
functioning properly while we had been designing the circuit using this 90nm CMOS 
technology first time. The core of the oscillator operates properly; however, because of the 
verification error in the layout, we noticed that the buffer attenuates the generated 
millimeter-wave signal by 18dB although it was designed to have lOdB gain. 

2.3.2.2 Millimeter-wave CMOS ASK Modulator 
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(a) 




20 40 60 80 100 
Frequency [GHz] 

(b) 



Fig. 28. Measured (a) insertion loss (S21) and (b) reflection loss (Sll) of the ASK modulator 
for ON and OFF states. 

The scattering parameters of the ASK modulator test circuit were measured on-wafer up to 
110GHz with Anritsu ME7808 network analyzer with transmission reflection modules for 
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the ON and OFF states, respectively. The measured insertion losses of the modulator for the 
two states are shown in Fig. 28(a). When the gate voltage is volt, the insertion loss was 
measured to be a 2.3dB at 60GHz. When the gate voltage was increased to VDD, the 
insertion loss became 25.8dB therefore isolation was calculated to be 23.5dB at 60GHz, 
which is defined as the insertion loss difference between the ON and OFF states. Figure 
28(b) shows the measured reflection of loss of the modulator for the two states. When the 
modulator is ON, Sll is lower than -lOdB up to 75GHz and it was measured to be a -16.2dB 
at 60GHz where it was matched to 50Q system. When the modulator was turned on by 
increasing the gate voltage, the Sll became -5.2dB. The maximum data rates as a function of 
the isolation of the millimeter-wave ASK modulators are shown in Fig. 29. It can be seen 
that the isolation and the maximum data rate have a tradeoff relationship. The product of 
the maximum data-rate and the isolation of this modulator is slightly less than the previous 
work in Section 2.2 but its maximum data is increased by 2Gbps and the insertion loss is 
improved by 4.3dB. 



10 



a. 
78 

(0 

ro 

■a 

E 

3 

E 

x 

re 



0.1 



. This Work 
\ 60GHz 



V A (Oncu, 2008, b), 60GHz 
(Kosugi, \ ^~ 

2003 & 2005) \ 



)(Ohata, 2005), 60GHz 

>(Ohata, 2000), 60GHz 
(Mizutani, 2000), 60GHz '••W 



Compound 

semiconductor 

CMOS 



% 



% 



(Chang, 2007), 46GHz 



10 20 30 40 

Isolation [dB] 



50 



60 



Fig. 29. Maximum data rates as a function of isolation of the ASK modulators. 

2.3.2.3 60GHz Pulse Transmitter 

The time-domain response of the pulse transmitter was measured using an Agilent 
Infiniium DCA 86100B wide-bandwidth oscilloscope with an Agilent 86118A 70GHz remote 
sampling module. The chip was measured by on-waver. The output is connected to the 
sampling oscilloscope by on-wafer probe and cables. The measurements were performed 
without any external filters at the output. The internal impedance of the measurement 
equipment is equal to a 50Q. Figure 30(a) and Fig. 30(b) show the output response for lGbps 
and lOGb/s respectively. Due to the high-speed binary base-band signal leakage from the 
gate, the baseline varied. Especially the leakage became stronger at 10GHz but it will not 
distort the transmitted millimeter-wave signal since the base-band leakage will be filtered 
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out in the 60GHz band antenna. The RF power can be measured from the time-domain 
response shown in Fig. 31. The maximum peak-to-peak voltage was measured to be 45mV 
for a 50f2 load impedance. It corresponds to -23dBm peak power. By using this circuit up 
lOGbps short-range wireless or proximity communication can be realized a power 
dissipation of 12.1mW. Our study showed us that with a proper buffer design and improved 
layout verifications, the output RF power would be increased up to a few dBm with an 
additional cost of a few tens of mW power dissipation for longer range applications. 
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Fig. 30. Measured output response of the transmitter for (a) a lGb/s and (b) a lOGb/s data 
trains. 

3. 60GHz CMOS pulse receiver 

In the past few years, millimeter-wave quadrature amplitude modulator (QAM) receiver 
circuits in the short-channel standard CMOS process have been reported with a several 
Gbps data rate and a better energy-per-bit efficiency than WLAN and UWB (Pinel , 2007). 
Conventional QAM receivers downconvert the received millimeter-wave signal to baseband 
using one or two voltage-controlled oscillator (VCO) and phase-locked loop (PLL) circuits. 
However, these building blocks consume several tens of mW. Additionally, total power 
consumption further increases using an analog-to-digital converter and a high-speed 
modulator, particularly when the data rate exceeds lGbps. By removing these power- 
hungry building blocks, 2Gbps and 5Gbps millimeter-wave CMOS impulse radio receivers 
were developed with a better power efficiency. The 2Gbps receiver detects millimeter-wave 
single-ended pulses using a single-ended CMOS envelope detector, and high-speed data is 
only processed using a limiting amplifier. The second receiver design contains a differential 
envelope detector, a voltage control amplifier, a current mode offset canceller and the data is 
processed using a high-speed comparator with hysteresis. In this section, 2Gbps and 5Gbps 
millimeter-wave CMOS impulse radio receivers will be studied. 



3.1 19.2mW 2Gbps CMOS pulse receiver 

The general architecture of conventional millimeter-wave QAM receivers is shown in Fig. 
31(a), where the received signal is downconverted using a local oscillator (LO) consuming a 
power of several tens of mW (Razavi, 2007; Mitomoto, 2007). Also, total power dissipation 
will even increase using a high-speed analog-to-digital converter (ADC) and a high-speed 
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demodulator (DMOD), particularly for the multi-Gbps data rate. Instead of using an LO, an 
ADC and a DMOD, a low-power CMOS pulse receiver is proposed in this work for multi- 
Gbps wireless communication, as shown in Fig. 31(b). The architecture is adopted from that 
of optical communication receivers due to the similarity between an optical pulse and a 
millimeter-wave pulse. In the following sections, the pulse receiver design and the 
measurement results are presented. 
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Fig. 31. Architectures of (a) a conventional 60GHz receiver and (b) the proposed 60GHz 
pulse receiver. 



3.1.1 19.2mW 2Gbps CMOS pulse receiver design 

Multi-Gbps communication will have low power consumption when a received signal is 
detected without using a high-frequency LO and high-speed data are processed using only a 
limiting amplifier (LA), as shown in Fig. 31(b). Figure 32(a) shows the widely used optical 
receiver architecture (Narasimha, 2007; Le, 2004). By adopting a similar principle, a 60GHz- 
band CMOS pulse receiver used for investigating the above concept is shown in Fig. 32(b). 
Here, a low-noise amplifier (LNA) is not implemented in this work to determine the 
inherent features of the millimeter-wave pulse receiver. As a result, the receiver consists of a 
nonlinear amplifier (NLA), a five-stage LA, an off-set canceller and an output buffer. To 
detect the millimeter-wave pulses, a metal-insulator-insulator-metal (MUM) diode 
(Rockwell, 2007) or a Schottky diode (Sankaran, 2005) was conventionally used. However, 
the MUM diode is used in special CMOS process, thus increasing the cost of the pulse 
receiver. And a Schottky diode is not always available in general design rules. To overcome 
this issue, a common-source amplifier, utilizing a square-law relationship between the drain 
current Id and the gate voltage V g of an NMOSFET, is used as a detector. In the NLA, V g is 
adjusted to maximize 8 2 Id/dVg 2 to detect the envelope of the millimeter-wave pulses 
efficiently. At the output of the NLA, the base-band signal is generated as shown in Fig. 33. 
The remainder of the circuitry is designed in the same way as for similar types of optical 
receivers. 
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Fig. 32. (a) Typical optical receiver architecture and (b) diagram of receiver block in this 
work to realize the proposed pulse receiver. 
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Fig. 33. Nonlinear pulse detection using a common-source amplifier. 



0.8 1.0 



3.1.2 Measurement and discussions 

The receiver was fabricated by a 90nm CMOS process. A micrograph of the receiver is 
shown in Fig. 34. The millimeter-wave switch in Section 2.2 was used for measurement. A 
60GHz continuous-wave (CW) signal applied to the switch input is modulated using a 
pattern generator in a bit-error-rate tester (BERT). To filter out base-band fluctuations due to 
switching, a V-band waveguide is inserted between the transmitter and the receiver. Before 
applying the pulses to the receiver input, the average pulse power is measured using a 
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millimeter-wave power meter. The 60GHz pulses and the demodulated digital signals 
transmitted at a data rate of 2Gbps are shown in Fig. 35. The eye diagram and bit-error rate 
(BER) of the receiver are obtained using 2 31 -1 bits of pseudo-random data. The eye diagram 
of the receiver is shown in Fig. 36 for the data rates of 1 and 2Gbps. In both cases, clear eye 
openings are observed. The output was 313mV peak to peak. The measured BER with 
respect to the average pulse power is plotted in Fig. 37 for 1 and 2Gbps data rates. The 
theoretical BER curves for the case of square-law detection are fitted to the measured data, 
the shapes of which agree with the square-law detection theory. The BER of the pulse 
receiver decreases more rapidly with increasing input power than that of a linear-detection 
receiver. 




Fig. 34. Micrograph of the pulse receiver. 
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Fig. 37. Bit error rate with 2 31 -1 random bits of data at 1 and 2Gbps data rates. 

The total power consumption of the pulse receiver including the buffer is 19.2mW. To 
compare between this receiver and optical receivers, a figure of merit FOM is determined as 
G*DR/Pdo where G is the power gain, DR is the data rate, and Pdc is the power 
consumption. The product of G and DR is plotted as a function of PDC, as shown in Fig. 38, 
where the FOMs are given by the slope. The FOM of this receiver is a slightly better than 



10000 



Jg 1000 
Q. 

Si 

(D 

— 100 



Q 



10 



<(.Werker,20Q4)' 
(Le,-2004) ...•••*' 



v5^ 



.V- this 
work 



(Palermo,2007) I 



(Chen,.- 
2006) 



(KrisHhapura, 
2005; 



' \ ..(S"eidl,2004M B (Narashifhha L 

•$!•• (Radov.anovic, ..-' ■ 

200 , 4)""' ^- (Swqbbda, 2006) 



# 



>,-- 



Jf 



10 



'j£L 



100 



1000 



10000 



Pdc [mW] 



Fig. 38. Product of gain and data rate as a function of power dissipation for the receivers in 
this work and previously reported optical receivers. 
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those of other reported optical receivers. It was shown by measuring the scattering 
parameters that suitable input matching would increase the power gain by 4.9dB. The 
receiver is also compared with recently reported millimeter-wave receivers in Table 1. Note 
that digital codes are provided at the output with only 19.2mW of the power consumption 
using the proposed pulse receiver. 

A low-power 60GHz-band CMOS pulse receiver was proposed for multi-Gbps wireless 
communication. Using a 90nm 1P6M standard CMOS process, the proposed pulse receiver 
achieved a 2Gbps data rate with a total power dissipation of 19.2m W, which consumes less 
power than recently reported 60GHz receivers. The performance of this pulse receiver 
indicates the possibility of new low-power multi-Gbps wireless communication at the 
60GHz band. 
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Table 1. Comparison of 60GHz receivers. 

3.2 49mW 5Gbps CMOS receiver 

The receiver circuit in Section 3.1 operates up to a 2Gbps data rate with a total power 
dissipation of 19.2mW, consuming less power than conventional 60GHz millimeter-wave 
QAM receivers. However, it suffers from input common-mode noise, sensitivity to supply 
voltage, and an insufficient data rate for 4.5Gbps wireless high-definition multimedia 
interface applications. To overcome these issues, a fully differential 5Gbps millimeter-wave 
CMOS impulse radio receiver in an 8M1P 90nm standard CMOS process was realized. The 
receiver contains an on-chip matching circuit, a fully differential envelope detector, a 
voltage-controlled amplifier (VGA), a current-mode offset canceller, a high-speed 
comparator with hysteresis. 



3.2.1 49mW 5Gbps CMOS receiver design 

A block diagram of the proposed receiver is shown in Fig. 39. The on-chip matching 
network is used for 50C1 impedance matching and also helps reject the off-band signals. The 
envelope detector detects the envelope of the received pulses; the VGA amplifies the 
received signal to the required level, and then the high-speed comparator processes the 
signal. The current-mode offset canceller circuit both cancels the offset due to the 
mismatching of the differential amplifiers through the receiver chain and drives the 
NMOSFETs of the fully differential envelope detector. 

Input signals are first given to the fully differential envelope detector through the input 
matching circuit. In practical applications an LNA will be included at the input of the 
receiver. Unlike the single-ended LNA, the differential LNA is superior in terms of 
common-mode noise rejection (Sun, 2006). The degradation of the common-mode noise will 
be stronger for an impulse radio receiver since the analog front-end and the logic circuits 
share the same substrate. To solve this issue, a fully differential CMOS envelope detector is 
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designed. The fully differential envelope detector (FDD) is shown in Fig. 40, along with a 
conventional single-ended detector (SED) for comparison. The SED used in Section 3.1 only 
detects single-ended pulses. In the proposed FDD, the differential signals are applied to the 
gates of two parallel NMOSFETs with the same size. Also, an active balun is used for 
generating a differential output and common-mode rejection as shown in Fig. 40. The FDD 
rejects common-mode noise from the substrate and power line. 
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Fig. 39. Block diagram of fully differential 60GHz band millimeter-wave CMOS impulse 
radio receiver. 



IN 



I 



Vin 

-m- 



Load 

OUT 

-o 

Vout 



\ 



Vbias 



Active balun 

VDD 



OUT+ FB . 



Single-ended 

envelope detector 

(SED) 

Fig. 40. Millimeter-wave CMOS envelope detector circuits 




FB+ 



VbiasP 



Fully differential 

envelope detector 

(FDD) 



280 



Advances in Solid State Circuits Technologies 






Vref 




A 




a: 



0.3 



0.2 



1 2 3 4 5 

Drain Current, l D [mA] 



0.2 0.4 0.6 0.8 1 
Gate source Voltage, V G [V] 



CO 




<1> 


i 1 


c 


* 


o 


<C, 


!_ 


CM 


i_ 


f» 


0) 


% 


H 


CO 


o 


T"i 






n 


CM 


c 


<U 


o 




o 




<1) 




M 





0.1 ° 5 



Fig. 41. Second-order nonlinearities with respect to drain current and to gate voltage. 

To improve the immunity of PVT variations, current-mode offset canceller is proposed. The 
envelope detector circuits, driven by the offset canceller as well as 60GHz input pulses, 
detect the envelope of the pulses using the square-law relationship between the drain 
current Id and the gate voltage V g of the NMOSFETs. In (Oncu, 2008, a), V g was adjusted to 
maximize d 2 Id/8V g 2 to detect the envelope of the millimeter- wave pulses efficiently, where 
V s is determined by the output common-mode voltage of the limiting amplifier. Here, the 
simulated second-order nonlinearity with respect to Id is shown in Fig. 41, along with that 
with respect to V g for comparison. The maximum nonlinearity is obtained when the 
transistor is biased in the moderate inversion region in both cases. However, since the peak 
characteristics of the nonlinearity with regard to Id are flatter than that with regard to Vg, the 
nonlinearity is insensitive to the deviation from the maximum point due to the PVT 
variations when the drain current Id is adjusted with respect to a reference current I re f and 
the envelope of the millimeter- wave pulses is efficiently detected. To utilize this advantage, 
the current-mode offset canceller is used, which contains a level shifter, a low-pass filter, a 
voltage-independent reference current generator, and a VI converter. 

A high-speed comparator with hysteresis is used in this design to process the input signal 
with rejecting a noise. Its circuit schematic is shown in Fig. 42. It has three subcirctuis: a 
positive-feedback decision circuit, a predriver, and a line driver. In the positive-feedback 
decision circuit, a differential driver and a positive-feedback load are composed of 
NMOSFETs to realize high speed with moderate bias current. No stacking transistor is used 
in the load to maximize an output voltage swing. Two current mirrors by PMOSFETs are 
used between the driver and the load. Since the operating speed of the PMOSFET current 
mirrors has to be improved to realize high-speed operation, higher overdrive voltage is 
applied to the PMOSFETs than to the NMOSFETs. The predriver utilizes a PMOSFET 
differential pair to obtain sufficient bias voltage since the output common-mode voltage of 
the positive-feedback decision circuit is reduced. The CMOS line driver is used for the final 
stage. The comparator test circuit is measured at a data rate up to 6Gb/ s with 500m Vpp 
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output voltage swing at a supply voltage of 1.2V and a current of 11.9mA, where the power 
consumption of the line driver is included. 
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3.2.2 Measurement and discussion 

The fabricated receiver is measured using an on-wafer probe station. The chip micrograph is 
shown in Fig. 43, where the chip size is 950|im x 750p.m. The input reflection coefficient of 
the receiver was measured using a 4-port network analyzer. Slldd is less than -lOdB at 
frequencies from 60GHz to 64GHz. Using an 8Gbps ASK CMOS modulator in Section 2.2 
millimeter-wave pulses are generated to characterize the dynamic behaviour of the receiver. 
62GHz differential ended pulses are applied to the input of the receiver using a magic tee, 
and the receiver is also tested using single-ended pulses. The receiver can receive 62GHz 
short-pulses in a time as short as 200ps. The measured receiver sensitivity is approximately - 
20dBm, which is suitable for high-speed millimeter-wave proximity communication 
applications. An LNA and a high-gain antenna will improve the sensitivity for long-range 
applications. An eye diagram of the receiver is obtained using 2 31 -1 pseudorandom bits of 
data. The eye diagram obtained at a data rate of 5Gb/ s data requires a total power 
consumption of 49mW. Measured results of the receiver performance are summarized in 
Fig. 44. The power consumptions of recently reported wireless digital receivers are 
compared in Fig. 45. The slope shows the figure of merit and the energy per bit. The graph 
shows that millimeter-wave receivers have better power efficiency than WLAN and UWB 
(Nathaward, 2008; Zheng, 2008). The millimeter- wave impulse radio receiver consumes the 
lowest energy per bit. The impulse receiver in Section 3.1 and the present impulse receiver 
have approximately the same energy-per-bit consumption of 9.8pJ/bit. However, this 
receiver is 2.5 times faster than that in Section 3.1. It is verified that millimeter-wave pulse 
receivers require low-power for high-speed communication. The 60GHz millimeter-wave 
band pulse communication can be used for low-power several Gbps wireless multimedia 
communication applications using a standard CMOS process. 
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Fig. 43. Chip micrograph. 
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wireless communication devices. 



4. Conclusion 

Millimeter-wave impulse radio for low-power high-speed wireless communication was 
studied. Because of the several GHz license free bandwidth of the 60GHz band, the 
millimeter-wave impulse radio was optimized to operate at 60GHz band. To study the 
important building blocks of the millimeter-wave impulse radio, five prototype CMOS 
circuits, operating at 60GHz band, were successfully realized using 90nm standard CMOS 
processes from various foundries. A millimeter- wave CMOS pulse generator, a high-speed 
millimeter-wave ASK modulator, a 60GHz pulse transmitter circuit, 2 and 5Gbps 
millimeter-wave CMOS pulse receivers are studied for a realizing low-power and high- 
speed millimeter- wave impulse radio. 

A carrier-less 60GHz CMOS pulse generator was fabricated using a 6-metal 1-poly 90nm 
CMOS process. By designing pulse generators in digital circuits, a millimeter-wave pulse 
can be generated without using a power-hungry LO. As a result, the pulse generator 
consumes a small amount of power proportional to input data rate. 

After that to provide a better RF performance using available CMOS technologies, pulse 
transmitter circuits containing a high-speed millimeter-wave ASK modulator and a 60GHz 
oscillator were studied. A 60GHz millimeter-wave band ASK modulator was successfully 
fabricated using a 6-metal 1-poly 90nm CMOS process. The maximum isolation at 60GHz 
was obtained by adjusting the transmission line length. The isolation and maximum data 
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rate of the switch were measured to be 26.6dB and 8Gbps, respectively. The ASK modulator 
does not consume DC operating power. Results indicate that a very high data-rate can be 
obtained at a 60GHz millimeter-wave band using a standard CMOS process. 
Then, a 60GHz pulse transmitter circuit and to study its building blocks, a 60GHz 
millimeter-wave CW signal source and a millimeter-wave ASK modulator circuits were 
successfully fabricated by an 8-metal 1-poly 90nm CMOS process. The RF power of the 
60GHz CW signal source circuit was measured to be -20.7dBm. The isolation of the ASK 
modulator was measured to be 23.5dB at 60GHz. The insertion loss of the modulator is 
2.3dB which is 4.3dB better than that of the previous ASK modulator. The data-rate and 
output peak-to-peak voltage on a 50Q load of the transmitter was measured up to 10Gb/ s 
and 45mV respectively. The total power dissipation of the transmitter is 12.1mW. The results 
indicate that a short-range, multi-Gb/s data-rate and low-power 60GHz millimeter- wave 
band wireless communication can be realized using a sub-lOOnm CMOS technology. 
In this study, a low-power 60GHz-band CMOS pulse receiver was proposed for multi-Gbps 
wireless communication. To investigate low-power and high speed pulse receivers, at first a 
prototype of a 60GHz pulse receiver was realized using a 90nm lpoly-6metal standard 
CMOS process. The proposed pulse receiver achieved a 2Gbps data rate with a total power 
dissipation of 19.2mW, which consumes less power than recently reported 60GHz receivers. 
The performance of this pulse receiver indicates the possibility of new low-power over- 
Gbps wireless communication at the 60GHz band. 

Then, to suppress the input common-mode noise, sensitivity to supply voltage, and reach a 
sufficient data rate for 4.5Gbps wireless high-definition multimedia interface (HDMI) 
applications, a prototype of a differential ended 5Gbps 60GHz pulse receiver was 
successfully realized in a lpoly-8metal standard 90nm CMOS process. It receives up to 
5Gbps millimeter-wave pulses with a power consumption of 49mW. Both pulse receivers 
have approximately same energy-per-bit consumption but the second one operates 2.5 times 
faster than the first one. It is verified that millimeter-wave pulse receivers require low- 
power for high-speed communication. 

Millimeter-wave pulse transmitter and receiver architectures were discussed in this chapter, 
where pulse signals can be received without using an LO nor an ADC by adopting 
asynchronous detection, which will lead to the realization of a low-power millimeter-wave 
wireless transceiver system. The study of CMOS millimeter-wave impulse radio will 
encourage the widespread adoption of consumer millimeter- wave applications. 
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1. Introduction 

Power amplifiers (PAs) determine much of the efficiency and linearity of transmitters in 
wireless communication systems, both on the base station side and in the handset device. 
With the move to third-generation (3G) communication systems as well as other systems 
such as Ultra-Wideband (UWB), a higher linearity is required due to envelope variations of 
the radio frequency (RF) signal. The traditional way of guaranteeing sufficient linearity is 
backing off the PA; however, this results in a significant drop in efficiency, and thus in 
reduced battery lifetime for the handheld device and increased cooling requirements for the 
base station. With the current energy costs, and increased density of base stations, this is fast 
becoming an important issue. 

A second issue in current wireless communication systems is the requirement for a certain 
range of transmitter output power control, e.g. for 3G systems. Depending on the distance to 
the base station, a difference in handset output power in the range of tens of dB may occur. 
If the PA efficiency is peaking for maximum output power, and is reduced considerably for 
lower output power, the average efficiency of the transmitter calculated over its full output 
power range of operation will be low. Thus, efficiency improvement for lower output power 
is an important aspect in transmitter design. 

Moreover, current wireless communication handsets require a multi-band/ multi-standard 
approach, so that several communication standards are incorporated in one device. Ideally 
this would all be achieved by one PA, but current standard is that multiple PAs are used for 
multiple standards, in worst case each with its bulky, costly output filter. 
In order to address efficiency and linearity issues, different transmitter architectures have 
been proposed and implemented throughout the years, such as for instance Envelope 
Elimination and Restoration (EER) or Envelope Tracking (ET), varieties of polar 
transmission where the envelope and phase of the signal are processed separately. Also, 
different PA architectures have been used, such as Doherty and switched mode amplifiers, 
often complemented with linearity-improving measures such as digital predistortion or 
feedback. 

With the coming of age of handset production, cost effectiveness has driven wireless 
communication transceiver design to higher levels of integration. As many building blocks 
as possible are integrated on the same chip, and the use of external bulky filters is avoided if 
possible. CMOS technology has been the main choice for this development, due to the 
possible integration of digital, mixed-signal and analog circuits. However, CMOS was not 
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suitable for PA design due to frequency, output power, efficiency and linearity 
requirements. Thus, the stand-alone PA has long been manufactured in III-V technologies or 
specialized technologies such as LDMOS. 

In recent years however, CMOS technology has evolved for radio frequencies in two ways: 
(1) Decreasing device dimensions have resulted in higher clocking frequencies, thus e.g. 
providing the opportunity for clocking speeds of several times the RF frequency; (2) The 
technology provides special RF properties such as thick top metal, allowing for e.g. 
integrated inductors or transformers with high quality factor. These two technology trends 
have enabled a higher level of transmitter integration. In combination with the use of 
switches, for which CMOS devices are extremely suitable, so-called digitally assisted RF 
transmitters have been designed, that is, transmitters where building blocks are switched on 
or off by means of digital control signals, or biasing settings are changed based on digital 
signals. 

Recently transmitter design research has taken the next step: increasingly using digital 
techniques for the full transmitter. A fully integrated GSM radio has been presented with 
all-digital phase and amplitude signal paths, including an all-digital phase-locked loop. 
Other examples are a class-E switched mode PA with pulse-width and pulse-position 
modulation (PWPM) implemented with all-digital blocks, an array of power mixers, 
controlled by digital logic, and an array of digitally controlled cascode transconductance 
stages not unlike current-steering digital-to-analog converters, referred to as digital-to-RF 
conversion. However, efficiency over a wide power range is still a major concern, as will be 
shown. 

In this chapter an overview of switched-mode power amplifiers will be presented. This will 
be followed by an overview of transmitter architectures suitable for switched-mode 
transmitters; direct modulation as well as polar and Cartesian modulation will be described 
by looking at traditional architectures and recent developments, with focus on switched- 
mode implementations, resulting in a future outlook for integrated transmitter design for 
wireless communication. 

2. Power amplifier technology issues 

Generally a switched-mode (SM) amplifier consists of one or more transistors that are 
behaving as a switch, that is, having an on- and an off-stage, ideally without on-resistance and 
near-zero raise- and fall time. These conditions can be approximated by heavily overdriving 
the transistor input, and by operating the device at significantly lower frequencies than the 
device's ft. The SM transistor is thus used differently than normal amplifier transistors, 
which are generally used as either current, voltage or transconductance amplifying elements. 
Overdriving the transistor input, however, has certain consequences: the device will act 
non-linearly, and small-signal models are not always valid. Moreover, for wireless 
communication applications the difference between operating frequency and device unity 
gain frequency /t is rather small - this in contrast to e.g. audio applications, where switched- 
mode techniques have been used extensively. In this section we will first discuss power 
amplifier technology issues, and then address losses in switched-mode power amplifiers. 

2.1 PA technology aspects 

It is only fairly recent that CMOS technology has come up as an alternative for integrated 
circuit power amplifier design, as CMOS previously was not suitable for PA design due to 
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frequency, output power, efficiency and linearity requirements. Thus, stand-alone PAs have 
long been manufactured in III-V technologies such as GaAs or GaN, or specialized 
technologies such as LDMOS or SiGe bipolar junction transistors. 

Largely driven by the drive for integrating more digital functionality on the same chip area, 
CMOS devices have continued to shrink in device dimensions, basically following Moore's 
law. Accordingly, transistor ft and fmax are expected to rise to several hundreds of GHz, thus 
allowing for circuit operation in excess of 100GHz (Niknejad et al., 2007). 
However, the trend of shrinking device dimension comes with certain distinct 
disadvantages for analog circuit design, and more specifically for PA design. Due to 
shrinking oxide thickness, the breakdown voltage of the devices is reduced, implying that 
supply voltages must be reduced for safe operation. This has implications for CMOS PAs, as 
the maximum output power, assuming load-line matching, is then given by 

Pout = V DD 2/2Ri (1) 

such that in a 50O system, and a supply voltage of IV, the output power is limited to lOmW 
or lOdBm. Thus, impedance transformation must be used so that the amplifier sees a lower 
impedance. This is practically limited to 1-5Q; Having such a low impedance makes the PA 
efficiency very sensitive to parasitic series resistance in the output network, because of 
conduction losses: A O.lmO parasitic resistance in series with a load resistance of 10 gives a 
loss of 10%. 

Due to these increasing technology limitations, in modern CMOS deep-submicron 
technologies special transistors are provided having a thicker gate oxide and thus allowing 
for higher supply voltage. 

2.2 Losses in switched-mode amplifiers 

Looking at RF power amplifiers, we want to have an output signal at the frequency of 
interest - usually the fundamental frequency, sometimes a harmonic - but no disturbing 
output signals at other frequencies. In other words, some filtering must be performed in 
order to use a switch in a power amplifier. 

The ideal waveforms for a switched-mode (SM) transistor in a PA, assuming a broadband 
load, are shown in Fig. 1. From this figure it can be seen that the voltage and current are 
ideally never non-zero simultaneously, thus no power is consumed, and ideally a 100% 
efficiency can be achieved. However, considerable power is generated at harmonic 
frequencies. Thus the maximum theoretical efficiency for this broadband SM PA is slightly 
larger than 80%, achieved at a 50% duty cycle. 

In order to reduce the power present in harmonic frequencies, a tuned amplifier can be 
used. This can be implemented in several ways. One way is by introducing harmonic shorts 
in parallel to Ri in Fig. 1, so that harmonics other than the desired frequency are grounded. 
The maximum theoretical efficiency now reaches 100%, however, for relatively low duty 
cycles (and thus very short pulses and low output power) (Cripps, 1999, p. 153). 
Another strategy is to have a resonance circuit in series with Ri, to make sure that only the 
desired frequency signal is passed on. This issue will be explored more in the section on 
class-F amplifiers. 

Device and switching losses 

Aside from the harmonic losses discussed in the previous section, some other losses can be 
identified in a SM amplifier/ transistor (El-Hamamsy, 1994). First of all, the transistor will 
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Fig. 1. An ideal switched-mode (SM) power amplifier, (a). Schematic, (b). Voltage and 
current waveforms. 

suffer from non-idealities, of which one is a non-zero on resistance. This will cause a non- 
zero voltage drop and thus so-called conduction loss, resulting in reduced efficiency. 
Secondly, the transistor will have non-zero rise- and fall times, potentially causing the 
current and voltage to be non-zero simultaneously. Also CMOS subthreshold current will 
contribute to this. 

Thirdly, dynamic losses due to charging and discharging of parasitic capacitors must be 
taken into account - the switching losses. These are proportional to the switching frequency 
f, and will likely dominate for RF applications. 

Other losses 

External elements such as output networks may cause losses as well, for example a tuning 
or impedance transformation network consisting of on-chip or discrete passive elements. 
These inductors and capacitors will include parasitics such as capacitances or series 
resistances. These may cause power dissipation and thus reduce the amplifier efficiency. 
A MOSFET is very suitable as a switch, toggling between the off mode for low gate-source 
voltage Vgs, and the triode region for high Vgs. The on resistance of the device is then given 

by 



Ro. 



(IVW)-(k'(V G s-V t -V D s))- 1 



(2) 



where L is the transistor length, W the transistor width, k' the transistor gain factor, V t the 
threshold voltage, and Vgs and Vds the gate-to-source and drain-to-source voltage, 
respectively. 

The on resistance can thus be minimized by choosing a large ratio W/L. Having a low 
resistance decreases the conduction losses caused by the switch. Other considerations of 
interest for PA design are the current density capacity and parasitic capacitances. The 
former is important if high output power is desired and the supply voltage is low. A larger 
width increases the current capacity. The parasitic capacitance may, however, cause 
increased dynamic losses, thus potentially decreasing the efficiency especially at high 
frequencies. 
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3. CMOS switched-mode power amplifiers 

Now that general technology issues have been discussed, SM amplifiers for radio 
frequencies will be addressed in this section, and an overview will be given of specific 
CMOS implementations. 



3.1 Switched-mode amplifier classes 

In amplifier theory, several different switched-mode types are established: the classes D, E 
and F (Cripps, 1999; Raab, 2001). They will briefly be addressed below, before looking into 
CMOS implementations in the next section. 

Class-D 

Class-D amplifiers use a double-switch structure, with a series resonance circuit (see Fig. 2). 
The output current is alternatingly supplied by each switch, similar to a push-pull 
configuration. The simplest implementation for the two switches is an inverter. The 
maximum theoretical efficiency is 100%, with a square- wave voltage and a half -wave 
rectified sine wave current in each device. In that case the voltage contains only odd 
harmonics, and the current even harmonics. 
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Fig. 2. Simplified schematic of a class-D amplifier, (a). A voltage-mode amplifier, (b). A 
current-mode amplifier. 

This amplifier may also be implemented as current-mode (see Fig. 2b). Instead of having a 
series resonance circuit in series with the load, a parallel resonance circuit is then used at the 
output of the amplifier. In that case the current approximates a square-wave, containing odd 
harmonics, while the drain voltage for each device approximates a half-wave rectified sine 
wave. It has been shown that a high efficiency can be achieved, assuming the amplifier can 
be designed for Zero Voltage Switching (Long et al., 2002; Kobayashi et al., 2001). 

Class-E 

A class-E amplifier consists of a single switching device with a carefully tuned output 
network. The voltage derivative, close to the timing point when the device is switched off, is 
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designed to be very small (so-called Zero Voltage Switching, ZVS) so that potential static 
losses are kept very low. Also for this amplifier the theoretical maximum efficiency is 100%. 
One of the characteristics of class-E is that large voltage peaks occur; thus, care must be 
taken to avoid high voltages across the CMOS device, as the breakdown voltage of CMOS 
devices is relatively low. 

Class-F 

A class-F amplifier is basically an amplifier with a current that approaches a half-wave 
rectified sine wave, and a voltage that approaches a maximally flat shape. Tuning a limited 
number of odd-order harmonics of the fundamental signal is used to achieve this. Two 
different structures are in use for class-F design, depending on which harmonics are seen at 
the drain: Regular class-F for odd-order harmonics, that is, the voltage is approximately 
maximally flat, and inverse class-F for even harmonics, i.e. a half -wave rectified sine wave- 
shaped drain voltage and a maximally flat shaped drain current (Raab, 2001). It must be 
noted that the inherent pulse shaping makes this amplifier less suitable for e.g. Pulse Width 
Modulated (PWM) input signals (Sjoland et al., 2009). 

All three amplifier classes depend to some extent on a frequency-selective output network. 
Thus, their operation cannot be considered broadband. Either they can only be used in a 
narrow, specific frequency range, or each amplifier's behavior may show significant 
differences depending on the frequency of operation. 

Research is progressing into variable output networks, where digital control signals are 
used to e.g. change the frequency of operation, or reconfigurable PAs, as well as output 
networks allowing for concurrent multi-band operation (Colantonio et al., 2008). In such 
digitally assisted systems the use of CMOS technology, also for the PA, may lead to a higher 
level of integration. This will be addressed more extensively in the section on transmitter 
architectures. 

3.2 CMOS PA implementations 

By the mid-1990s, the first publications on integrated CMOS PAs for RF appeared. These 

works initially focused on more or less linear amplifier structures such as class A, AB, B or 

C, but research has since then focused more on the switched-mode class-D, E and F, as 

higher clocking or switching speeds became available with improvements in CMOS 

technology. 

Su and McFarland (1997) presented a 0.8(im CMOS SM amplifier consisting of four stages 

with the final stage in switched-mode. A Power-Added Efficiency (PAE) of 42% was 

achieved at 850MHz with a 2.5V supply, and largely off-chip input and output matching 

networks were used. Yoo and Huang (2001) presented a 0.25(im CMOS class-E PA, using a 

finite DC feed inductor to reduce the peak voltage over the device, as well as Common Gate 

(CG) switching instead of the more usual Common Source (CS) structure. These strategies 

allow for a higher supply voltage to be used, thus reducing the necessity for a low load 

impedance. 

Reynaert and Steyaert (2005) have presented a fully integrated 0.18(im CMOS class-E PA, 

consisting of three stages and including supply modulation to provide amplitude variation. 

A PAE of 34% was achieved for an output power of 23.8 dBm, using a supply voltage of 3.3 

V and extra thick gate oxide for the final stage. 

As limited supply voltage is one of the major challenges in CMOS PA design, other 

strategies have been used to effectively add the output voltages, such as using a transformer 



CMOS Integrated Switched-Mode Transmitters for Wireless Communication 



295 



to combine output power (Aoki et al., 2008; Haldi et al., 2008) or stacking devices, making 
sure that the voltage over each device stays below the maximum (Stauth & Sanders, 2008; 
Jeong et al., 2006). However, generally this slightly impairs the efficiency, counteracting the 
intended advantage of a higher supply voltage. Apart from voltage stacking, current 
combining has been implemented (Kavousian et al., 2008; Kousai & Hajimiri, 2009), as well 
as the switching in of several parallel stages (Walling et al., 2008). The latter two will be 
covered more in the section on transmitter architectures. 



Reference 


Class 


Technology 


Supply 
voltage 


Output 
power 


Efficiency 
(PAE) 


Frequency 


Su et al., 1997 


D? 


0.8nm CMOS 


2.5 V 


30dBm 


42% 


850 MHz 


Tsai et al., 1999 


E 


0.35(im CMOS 


2.0 V 


30dBm 


48% 


1.9 GHz 


Yoo et al., 2000 


E 


0.25nm CMOS 


1.9 V 


30dBm 


41 % 


900 MHz 


Kuo et al., 2001 


F 


0.2nm CMOS 


3.0 V 


32dBm 


43% 


900 MHz 


Sowlati et al., 2003 


? 


0.18nm CMOS 


2.4 V 


24dBm 


42% 


2.4 GHz 


Reynaert et al., 2005 


E 


0.18nm CMOS 


3.3 V 


24dBm 


34% 


1.75 GHz 


Stauth et al., 2008 


D 


90nm CMOS 


2.0 V 


20dBm 


38.5% 


2.4 GHz 



Table 1. An overview of CMOS integrated switched-mode power amplifiers. 

4. Transmitter architectures 

As we have seen before, one of the basic requirements for power amplifiers in modern 
wireless communication systems is to accommodate envelope variations and to provide 
variable output power. Wireless communication standards have moved from constant- 
envelope, low- channel bandwidth to more complex signal shapes in order to increase data 
rates in limited bandwidth, resulting in variable envelope RF signals and larger channel 
bandwidths in the range of tens of MHz. 

In SM amplifiers output power variation can be achieved by varying the supply voltage, by 
varying the duty cycle of the signal, by varying the load, or by a combination of these. In 
this section some transmitter architectures will be discussed that adopt such strategies; only 
the strategy of varying the load impedance will not be addressed here. 

4.1 Supply variation 

On the transmitter architecture level, one of the classical methods of varying the output 
power is based on polar modulation, where a baseband Cartesian signal VRf(t) is first 
converted into its polar form, separating envelope (amplitude) and phase information, 
which are then processed separately and combined before being transferred to the antenna: 



v RF (t) = I(t) cos(2jif t) ■ + Q(f) sin(2jif t) (Cartesian) 
= A(t) cos(2jif t + cp(t)) (polar) 



(3a) 



where 



A(t) =V(i(o 2 + am 

9 (0 = tan-i(I(t)/Q(0) 



(amplitude) 
(phase) 



(3b) 
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Polar modulation is recently gaining more and more interest due to its potential to maintain 
linearity while having a relatively high efficiency even for lower output power, thus 
improving the average efficiency over a wide output power range. 

One of the most well-known polar schemes is Envelope Elimination and Restoration (EER), 
brought to attention by Khan (Khan, 1952; Wang et al., 2006; Su & McFarland, 1998). The 
envelope is used to control the PA supply level, while the phase signal is upconverted to RF 
and transformed to a constant envelope signal, driving the PA input. Thus, a non-linear PA 
can be used. Su and McFarland (1998) have demonstrated a CMOS implementation of an 
EER system, including a delta-modulated supply, a limiter, and envelope detectors, driving 
a switched-mode PA, resulting in significant linearity and efficiency improvements. 
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Fig. 3. Simplified representation of the Envelope Elimination and Restoration (EER) and 
Envelope Tracking (ET) transmitter architectures. 

Envelope tracking (ET) describes a transmitter architecture where the Cartesian RF signal is 
amplified by means of a linear amplifier, with its supply controlled by the envelope of the 
signal (Hanington et al., 1999; Takahashi et al., 2008). One of the main advantages is that the 
bandwidth of the PA input signal is not expanded, but a linear amplifier generally has a 
lower efficiency than a SM amplifier. However, requirements on the envelope signal and 
timing are less stringent (Wang et al., 2006). So-called hybrid EER architectures have been 
demonstrated, where the ET linear amplifier is replaced by a SM amplifier, however, still 
driven by the full Cartesian RF signal (Wang et al., 2006). 
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Both the EER, ET and hybrid EER depend on utilizing an efficient power supply modulator, 
that must be able to handle the bandwidth of the envelope signal. For this, a boost dc-dc 
converter, a Buck dc-dc converter, or a switched-mode low-frequency amplifier can be used, 
controlled by a Pulse Width Modulator (PWM), a Sigma-Delta modulator (ZAM) or a Delta 
modulator (AM) (Kitchen et al., 2007). Generally, independent of supply modulator type, a 
bulky low-pass filter must be used to filter out undesired signals such as noise or harmonics. 

4.2 Changing the duty cycle 

If the duty cycle D of a square-wave signal is changed, the output power at the fundamental 
frequency will be changed according to 

PoJfo) = (4V DD yji2R,) sin2(jiD) (4) 

assuming ideal frequency selection at the output. This can be used to accommodate the 
envelope and power variations in a polar transmitter, by changing the amplifier's threshold 
voltage. Implementations exist with discrete steps as well as continuous change (Yang et al., 
1999; Cijvat et al., 2008; Smely et al., 1998). A major advantage of these strategies is that no 
DC-DC converter is necessary; A disadvantage is that linearity may be worse compared to 
an amplifier where the supply voltage is changed, possibly resulting in tougher 
requirements for digital predistortion. Moreover, the efficiency drops rapidly at small duty 
cycles (Cijvat et al., 2008). 

Smely et al. (1998) combined discrete supply voltage steps with changing the drain current 
of the output stage of a class-F stage by means of varying the GaAs MESFET gate voltage, 
depending on the amplitude of the input signal. Yang et al. (1999) focused on improving the 
efficiency of a class-A amplifier, by using variable bias to change the current in the output 
stage as well as changing the supply voltage. 

Variable gate bias was used (Cijvat et al., 2008) for CMOS class-D amplifiers, with the goal 
of creating a PWM signal at the output of the amplifier. The proposed architecture uses the 
envelope signal to control the gate bias, and the RF signal is assumed to be sinusoidal, 
containing only the phase information. 

For this amplifier structure, loss mechanisms as discussed in section 2 cause a drop in drain 
efficiency for lower output powers. It is likely that switching and harmonic losses are 
significant; the amplifier switches as often as for full output power, thus having roughly the 
same switching loss, and the harmonic content of a PWM signal increases for duty cycles 
other than 0.5, thus increasing harmonic losses. As can be seen in Fig. 4.b, the amplifier 
aimed for higher output power, having larger output devices and thus larger parasitic 
capacitances, reaches a lower maximum drain efficiency as a result. 

As was addressed by Sjoland et al. (2009), one of the challenges of polar modulation is the 
sharp notch in amplitude variation which causes fast amplitude variations that are difficult 
to track for a DC-DC converter with limited bandwidth. Thus, a combination of EER and 
Pulse Width Modulation is proposed. This is applied to the aforementioned 130 nm CMOS 
class-D inverters, and simulation results are presented in Fig. 5. 

It can be seen from this figure that efficiency gains of EER and PWM combined are minimal 
in this case, compared to EER-only. Moreover, combining the two strategies will lead to 
greater transmitter complexity; the additional power that is required is not taken into 
account in the simulations. However, as was mentioned earlier, this solution may address 
the bandwidth limitations of EER. 
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UMC 130nm, Pout vs. efficiency 



■=• -10,D - 

a 
o 
a. -15,D 

-20,0 - 

-25,0 - 

-30,0 - 





























































































■ Efficiency 






1 



5,0 e 

's 

















































































-♦-PWMVGB-6 
-■-PWMVGB-12 





















Output power [dBm] 



(a) 



(b) 



Fig. 4. (a). Measured output power and efficiency of a 6 dBm 130nm CMOS class-D inverter 
chain, using gate bias variation to create a pulse width modulated inverter output voltage 
(Cijvat et al., 2008). (b). Efficiency versus output power of two amplifiers, one with 6dBm 
and one with 12 dBm output power. The supply voltage was 1.2 V. The 6 dBm amplifier 
operated at 1.5 GHz, the 12 dBm amplifier at 1 GHz. 
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Fig. 5. Simulated PA drain efficiency versus output power, combining EER modulation for 
high amplitudes and PWM for lower amplitudes. The voltage where EER takes over is 
varied; one curve shows results for a border value of 0.6V and the second curve for a border 
value of 0.9V. 



4.3 Burst-mode transmitters 

A third method for varying the output power is so-called burst mode transmission. 
Effectively the RF signal is turned on and off by means of a bit stream. The envelope signal 
may be digitized e.g. by means of a SA or a Pulse Width Modulator (Jeon et al., 2005; 
Berland et al, 2006; Stauth & Sanders, 2008). 
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A burst-mode pulsed power oscillator to be used as a final stage in a transmitter was 
presented by Jeon et al. (2005). The oscillator is turned on and off by a PWM representation 
of the low-frequency envelope signal, thus resulting in the high-frequency RF signal 
multiplied by the PWM signal, appearing as bursts at the oscillator output. An isolator and 
bandpass filter are used to prevent reflected power to return into the oscillator and filter out 
undesired frequency components. 

Berland et al. (2006) analyzed two varieties of using a one-bit signal to be multiplied with the 
slightly modified Cartesian signal. The one-bit signal was derived from the envelope signal by 
utilizing a Pulse Width Modulator and a Sigma-Delta Modulator, respectively. A high 
operating frequency of several GHz is, however, necessary to reach sufficient performance. 
A polar modulator using a baseband 2AM and an RF Pulse Density Modulator (PDM) were 
used to drive a class-D amplifier with a 1-bit signal (Stauth & Sanders, 2008). This solution, 
basically all-digital, was implemented in 90nm CMOS and the cascade PA operated from a 
2V supply. The PA performance can be seen in Table 1. The Bluetooth 2.1+EDR spectral 
mask was met for an output signal in the range of lOdBm, including a bandpass filter at the 
output. 

4.4 Digitally controlled TX 

In analogy to current-steering Digital-to-Analog converters (Zhou & Yuan, 2003), a fourth 
strategy to control output power has recently gained attention, which is switching in 
parallel stages. One example is the work by Kavousian et al. (2008), where the low- 
frequency envelope of the polar signal was transformed into a thermometer code used to 
switch on and off unit stages, while the constant-envelope RF phase signal drives the input 
of each stage. The authors refer to this as digital-to-RF conversion. 

Shameli et al. (2008) used 6 control bits to both switch in a number of parallel output stages 
and at the same time change the supply voltage with a ZA modulator. A 62 dB power 
control range was achieved, as well as a 27.8dBm maximum output power and an average 
WCDM A efficiency of 26 .5 % . 

Current summing was also used by Kousai and Hajimiri (2009), utilizing 16 parallel power 
mixers and a transformer at the output. The phase information modulates the high- 
frequency digital LO signal. Linearization could be chosen to be analog, by sensing and 
feeding back the signal level for each mixer core, or digital, by using a thermometer code for 
the envelope signal, switching on and off mixer cores. Both the baseband and the LO signal 
where controlled digitally with a number of bits. A 16-QAM (Quadrature Amplitude 
Modulation) signal at 1.8 GHz and a symbol rate of 4 MSym/s was reproduced with an 
output power of 26 dBm, a PAE of 19% and an EVM (Error Vector Magnitude) of 4.9%. 
Presti et al. (2009) used 7-bit thermometer + 3 bit binary weighted current summing 
combined with analog input power control for low-power levels. Relative broadband 
operation, 800-2000 MHz, and a 70dB power control range is achieved. With Digital Pre- 
Distortion (DPD) both WCDMA, EDGE and WiMAX specifications are met. 
In these architectures no supply voltage modulator is used. Sufficient resolution to achieve a 
high linearity or amplitude accuracy is achieved by increasing the number of parallel stages. 
However, the efficiency of these current-summing amplifiers follows a class-B curve (Presti 
et al., 2009): 

n cc VPout (5) 
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Walling et al. (2008) used control bits to generate a suitable Pulse Width/ Pulse Position 
(PWPM) signal, which was then provided to four class-E quasi-differential stages. In a 65nm 
technology, a maximum output power of 28.6 dBm and PAE of 28.5% is achieved at 2.2 GHz 
with the output stage using a supply voltage of 2.5 V. For a 192kHz symbol rate, non- 
constant envelope n/4-DQPSK (Differential Quadrature Phase Shift Keying) modulated 
signal, an output power of 27 dBm is achieved with an EVM of 4.6%. 

4.5 Direct RF modulation 

A third strategy to process the signal is to directly modulate the RF signal into the SM 
amplifier. For instance, a Pulse Width/ Pulse Position modulator (PWPM) or a Sigma-Delta 
(ZA) modulator can be used (Nielsen & Larsen, 2008; Wagh & Midya, 1999). This is depicted 
in Fig. 6. A major disadvantage however is that generally a high sampling or operating 
frequency is necessary, typically at least 4/rj, in order to achieve the desired resolution. This 
implies a large power consumption in the modulator, as this is directly proportional to the 
frequency. Moreover, since the PA switches more often, more switching loss will occur, 
reducing the efficiency. 



baseband 
signal 

(Cartesian) 




VCO/ clock gen.(A|)- 



Fig. 6. Direct modulation of the RF signal by means of Sigma-Delta (ZA) or Pulse Width 
Modulation (PWM). 

Wagh and Midya (1999) presented the concept of Pulse Width Modulation for RF. Nielsen 
and Larsen (2008), utilizing GaAs technology, used a feedback circuit and a comparator to 
generate an RF PWM signal. The signal's adjacent channel power ratio stayed well below 
the UMTS spectrum mask, allowing for some non-linearity from a subsequent PA. 
Direct modulation was also proposed by Jayaraman et al. (1998), utilizing a bandpass ZA 
modulator, simulated with GaAs HBT technology. Discussions on efficiency were presented, 
and it was indicated that the linearity demands were moved from the PA to the ZAM. 



4.6 Cartesian modulation 

Even though polar modulation has some distinct efficiency advantages, as an alternative 
Cartesian modulation may be used, that is, the I and Q baseband signal that differ 90° in 
phase are each processed in the transmitter and then summed either directly before the PA, 
or alternatively, each signal is amplified and the two signals are combined after the 
amplifiers. An advantage is that the signal is not transformed into its amplitude- and phase 
component, a non-linear transformation putting tough requirements on the delay and 
recombination of the two signals. 

Bassoo et al. (2009) have proposed a combination of Cartesian and polar modulation, where 
the SMPA input signal is a SD modulated Cartesian signal divided by the amplitude signal, 
which may be more or less bandlimited (see Fig. 7). Analysis showed that the envelope 
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signal can be limited to 75% of the channel bandwidth without impairing the efficiency, still 
keeping OFDM clipping limited and EVM very low. Thus, a combination of EER and 
PWPM can be used to have a high efficiency over a wide range of output power while 
avoiding the bandwidth expansion of polar modulation. 
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Fig. 7. Simplified architecture presented in Bassoo et al. (2009) for a combined polar and 
Cartesian modulator. 

4.7 Efficiency comparison 

Simulations have been performed on a 130nm CMOS class-D switched-mode amplifier, in 
order to compare the drain efficiency versus output power of the different architectures that 
have been discussed in the previous sections, such as Envelope Elimination and Restoration 
(EER), Envelope Tracking (ET), and Pulse Width Modulation by Variable Gate Bias 
(PWMVGB). Moreover, hypothetical curves for class- A and class-B operation have been 
drawn (see Fig. 8), with the peak efficiency as starting point. Class-A represents linear 
amplifier operation while class-B can be said to represent current-summing architectures. 
Not unexpectedly the EER and ET architectures perform best, showing the highest efficiency 
for lower output power ranges. It may thus be concluded that the use of supply modulation 
is desirable for high average efficiencies. However, it can also be seen that efficiency remains 
a challenging aspect, especially taking into account numerous other requirements such as 
linearity, channel bandwidth, multi-mode/multi-standard operation and output power 
control range. 



5. Summary 

It is only fairly recent that CMOS technology has become a competitive alternative for 
integrated circuit power amplifier design for wireless communication handsets, as CMOS 
previously was not suitable for PA design due to frequency, output power, efficiency and 
linearity requirements. Thus, stand-alone PAs have long been manufactured in specialized 
technologies. Nowadays however CMOS has evolved to operating frequencies far into the 
GHz range, and many of the limitations, such as efficiency when used as linear 
amplification element, can be compensated by more digital control. Thus, a higher level of 
integration and more complex transmitter design result. However, the trend in CMOS 
technology development is to reduce device dimensions and as a consequence breakdown 
voltage. This complicates CMOS power amplifier design. 



302 



Advances in Solid State Circuits Technologies 
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Fig. 8. Simulated drain efficiency for a CMOS class-D amplifier in different architectures, 
such as Envelope Elimination and Restoration (EER), Envelope Tracking (ET), and Pulse 
Width Modulation by Variable Gate Bias (PWMVGB). Class- A and class-B curves serve only 
as an illustration. The amplifier operated on a 1.2V supply and the input signal had a 
frequency of 2 GHz. (a). The output power (x-axis) represented in dBm, (b). The output 
power in mW. 

Transmitter architectures using polar signals have gained in popularity, as splitting the 
Cartesian signal into a low-frequency envelope signal and a high-frequency phase signal 
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provides excellent opportunity for efficiency improvements because a non-linear power 
amplifier can be used. A number of different polar architecture implementations exist, both 
digital and analog. However, signal bandwidth and supply requirements are challenging 
aspects of such designs. Other strategies have thus been used to avoid supply voltage 
modulation, such as switched control of the supply voltage or variable gate bias. Moreover, 
direct RF modulation can be used, implemented as a sigma-delta or pulse width modulator 
at high operating frequency. Recently, design strategies such as current steering have gained 
interest for use in PA and transmitter design. Digital control bits are used to generate a 
scaled output current, providing a high output power without straining the devices. 
However, efficiency over a wide range of output power is still a challenging aspect of 
transmitter design, especially if other requirements such as linearity, power control, multi- 
mode/ multi-band operation and channel bandwidth must be fulfilled simultaneously. 

5.1 Future outlook 

As CMOS technologies continue to develop to dimensions well below 65nm, special devices 
suitable for high supply voltage will likely continue to be provided, for example using high- 
K metal gate material. Such devices can be used on the same chip as digital circuits with 
clocking speeds of several GHz. Moreover, other substrate types may be used more 
extensively, such as Silicon-on-Isolator substrates. As they are less lossy, this may provide 
efficiency improvements. 

On the other hand, performance requirements will continue to rise with the development 
and maturing of wireless communication systems, especially because of the desire to cover 
more and more standards in one handset (multi-mode/ multi-standard operation). Digital 
control may be used to accommodate greater flexibility, reconfigurability and on-chip 
calibration in transmitter design. Moreover, techniques may be used to increase the 
adaptivity of components such as antennas, duplexers, filters and matching networks. 
CMOS will continue to expand into the millimeter-wave range, with operating frequencies 
beyond 60 GHz. However, other technology developments may play an important role in 
future integrated circuit design for wireless communication, such as integrated RF MEMS 
(microelectromechanical systems). Also devices such as carbon nanotubes may be used for 
wireless applications. But such technologies have some way to go until they reach the level 
of integration that current CMOS technology has. 
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1. Introduction 



This chapter describes a trend in dimension increase in structures of semiconductor 
memories and transistors focusing on metal-oxide-semiconductor, MOS devices. One, two, 
and three-dimensional (3-D) structures correspond to a legacy of grown-junction bipolar 
transistor, planar MOS transistor, and trench-capacitor dynamic-random-access memory, 
DRAM (Sunami et al., 1982-b & 1984), respectively. Flash memory has recently begun to 
employ 3-D stack of memory cells (Endoh et al., 2001). 

To maintain the sufficient margin in DRAM operation, storage capacitance value should be 
kept as big as possible against scaling of memory cell area. In response to the requirement, 
3-D capacitor has been introduced. The capacitor can be increased with the increase in the 
height of the capacitor without enlargement of planar area of the memory cell. First 
commercially available trench-capacitor DRAM appeared in mid 1980's at 1-M bit era 
together with stack-capacitor cell (Koyanagi et al., 1982). Recent stack-capacitor DRAM has 
begun to utilize a 3-D cylindrical capacitor same as trench capacitor cell. 

While, MOS transistor has been shrunk continually from 12 urn to 45 nm with planar 2-D 
structure since early 1970's to date. An empirical fact that smaller MOS device leads to 
higher performance was theorized with the scaling theory (Dennard et al., 1968). However, 
hazardous short channel effects become obvious in sub-um channel length regime. It is 
predicted that commercially usable minimum MOS device might be in the range of 5-10 nm. 
To cope with the short-channel effects vertical-channel transistors such as trench transistor 
(Richardson et al., 1981), surrounding gate transistor, SGT (Takato et al., 1988), and DELTA 
(Hisamoto et al., 1991) were proposed. Subsequently they have been extensively 
investigated these ten years. It is expected that the vertical transistor such as FINFET (Choi 
et al, 2001) will soon be applied to products to overcome the short-channel effects of 2-D 
transistor leading to a new era of 3-D LSI. 

To summarize device trends in volume and size, increase in device count per chip and 
shrinkage of feature size are shown in Fig. 1. More than one-million fold increase in the 
device count has been achieved these 40 years leading to almost the same increase in 
processor performance. This has been driving enormous development of electronics and 
information technology. 



308 



Advances in Solid State Circuits Technologies 




1950 



1960 



1970 



1980 



1990 



2000 2010 
Year 



2020 



Fig. 1. Trends in device count/ chip and feature size of MOS device. A DRAM cell consists of 
two devices of a cell transistor and a storage capacitor. 

2. Grown-junction bipolar transistor as 1-D structure 

It is well known that the first transistor invented in mid 1940's was a point-contact 
germanium bipolar transistor. Then, grown-junction type bipolar transistor became the first 
commercially successful semiconductor device (Teal et al., 1951). Although real devices are 
actually fabricated in 3-D structure, an operation mechanism of this bipolar transistor is 
based on 1-D current flow in principle. 

Bipolar devices had been dominant in semiconductor market until early 1970's, and then 
MOS devices took over their position featuring less power consumption, denser packing, 
superior thermal stability, etc. Bipolar devices still survive in limited applications in fields of 
figh-frequency low-noise, inexpensive small scale IC, and high power. Besides, a kind of 
combination structure of bipolar and MOS transistors is insulated-gate bipolar transistor, 
IGBT. IGBT utilizes both an advantage of voltage-driven gate of MOS transistor and that of 
high current drivability of bipolar transistor. IGBT becomes a major device in power 
electronics such as electricity control in electric and hybrid cars. 

No further description is made in this chapter since the major topic here is "scaling and 
higher integration of semiconductor device." 



3. Invention of trench-capacitor DRAM cell as a quasi-3-D structure 

3.1 Advent of DRAM 

First DRAM was introduced to the market in 1970 by Intel with a 1-Kbit chip using three- 
transistor DRAM cells (Regitz & Karp, 1970). Subsequently, former 4-Kbit DRAM which was 
still employing 3-transistor cell began to be installed in IBM's mainframe computers. This 
was just the time when MOS devices were proven to deserve application as highly reliable 
main memory in mainframes. Until that time, MOS devices had been regarded as 
insufficiently stable. 
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A few years later 4-Kbit DRAM using the one-transistor cell (Dennard, 1968) was being 
widely manufactured. This memory cell trend is shown in Fig. 2 for equivalent circuit 
configuration. Since one-transistor cell was much smaller than two of others, very low-cost 
manufacturing was possible. Its low cost has been contributing to the development of 
personal computer. Then, the DRAM capacity has been increasing by a factor of four every 
three years until today. As modern computers are based on von Neumann's architecture, 
main memory is a key device together with processor. Along with the prosperity of 
computing, the demand for memory has increased to produce a world-wide 30-B$ market in 
2009 for DRAM. Even if the main customer is still personal computer, various applications 
are extending DRAM's usage, e. g. cell phone, game machine, personal audio, and video 
machine, etc. 
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Fig. 2. Trend in DRAM cell configuration. 
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3.2 Key factor of cost 

The strongest driving force for growing of DRAM market is undoubtedly "price". Therefore, 
various development efforts have been focusing on reduction of manufacturing cost. One of 
major effort is primarily devoted to finer patterning. The bit cost has decreased by a factor of 
10" 6 during 30 years since 1970, and 1-Gbit product has already been sold at the less price of 
1-Kbit. Since chip cost is closely related to number of chips on a wafer, the wafer size has 
been continually increased to be such as 50, 75, 100, 125, 150, 200, and now 300 mm in 
diameter. Together with the diameter increase, memory cell size has been reduced to be 1/3 
in each DRAM generation in volume production to absorb chip size increase. Consequently, 
the chip size has been enlarged at most up to 10 times despite the bit increase by a factor of 
10 6 from 1 Kbit to 1 Gbit. Then, memory cell size decreases down to a factor of 10" 5 as shown 
in Fig. 3. 

3.3 Invention of trench-capacitor DRAM cell 

In response to chip size reduction to cope with 4-times increase in memory capacity, the 
memory cell size has been reduced to almost one-third in each generation, previously 
shown in Fig. 3. The DRAM cell, so-called 1-transistor cell, consists of one cell transistor and 
one storage capacitor. Key specifications in DRAM operation, such as noise margin, soft- 
error durability, operational speed, power consumption, strongly depend on the capacitance 
of storage capacitor (Dennard, 1984). The capacitance value, Cs is expressed as 



C s = eA/T 1 



(1) 
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where Ci, A, Ti, are permittivity of storage insulator, area of capacitor electrode, and 
insulator thickness, respectively. Therefore, the cell size reduction through scaling leads to 
the area reduction and subsequent decrease in capacitance value. 
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Fig. 3. Memory cell size shrinkage at DRAM in volume production. 

To cope with the dilemma as to cell size vs. capacitance, insulator thickness was reduced by 
a factor of 10 from 100 nm in 1-Kbit to 10 nm in 1-Mbit chips, becoming adversely close to 
dielectric field breakdown. When the author took a glimpse at some conference presentation 
from Texas Instruments Inc. in 1974 introducing a highly efficient silicon solar cell with 
plural steep trenches, as shown in Fig.4 (a), forecasting the upcoming issue of cell size vs. 
capacitance, he got an idea of a trench capacitor DRAM cell. Even though his job at that time 
was to characterize the silicon surface with photoemission spectroscopy, his amateur-radio 
hobby connected the shape of trimmer condenser, which has two coaxial cylindrical 
opposite electrodes as illustrated in Fig. 4 (b), with the need of the 1-transistor cell. From 
that idea, he invented a trench capacitor cell and applied for a Japanese patent in 1975 
(Sunami and Nishimatsu, 1975). Due to its low score of assessment, this was not applied to 
any overseas patent. 
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Fig. 4. Hints to create a trench-capacitor DRAM cell concept: proposed solar cell with steep 
trench, (a), a photograph of trimmer condenser, (b), and its equivalent model, (c). 
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As Hitachi had won a leader's position in 64-Kbit DRAM products with a 5-V single power 
supply (Itoh et al., 1980) and folded bit-line arrangement (Itoh, 1975), its research and 
development group could afford to challenge for novel cell development together with the 
development of integration processes at 1.3-um technology node. After several years' 
development, the first 1-Mbit level trench cell in trial production was successfully 
implemented and then presented in IEDM (Sunami et al., 1982). The development story is 
described hereafter. 

The memory cell obtained measured 4 urn by 8 urn with a 2.5-u.m deep trench. The capacitor 
insulator is a triple layer of Si02/Si3N4/Si02of which thickness was equivalent to that of 15- 
nm Si02. Resultant capacitance per unit area was 2.2 fF/um 2 . Then, obtained storage 
capacitance, Cs with a trench of 2.5 um in depth and bit-line capacitance, Cgwere 45 and 400 
fF, respectively with 128-bit folded bit-line arrangement. Resultant Cs/Cg ratio is 0.11. This 
value is sufficiently large for the stable operation. A scanning-electron micrograph of a 
cross-section of the cell is shown in Fig. 5. 

This first trial 1-Mbit cell array with trenches won a signal voltage of 200 mV at 5-V power 
supply, as shown in Fig. 6. Since it was empirically recognized that sufficient signal voltage 
was around 100 mV in those days, an obtained S/N ratio was large enough to obtain stable 
DRAM operation. Therefore, high immunity to alpha particle hit was being strongly 
expecxted for coming megabit DRAM products until the time when actual soft-error 
measurement was made. 




Transfer gate 
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—■ Poly silicon plate 
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Fig. 5. An SEM cross section of memory-cell array of 1-Mbit DRAM in trial production. The 
memory cell measures 4 urn by 8 um. 



3.4 Changes of trench cell employment 

In a R&D project, 1-Mbit DRAM was a prime vehicle to drive 1.3-um node MOS 
technologies, such as lithography, dry etching, film deposition, gate material selection, 
metallization, etc. except packging. In the course of trench DRAM development, several 
issues were given birth to. Major ones were 

a. degraded oxide uniformity on trench wall, which leads to degradation of oxide integrity, 

b. trench to trench leakage which limits further denser packing of cells, and 

c. increased soft error which is fatal in application to reliability-conscious computers. 
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Fig. 6. Output signals of a sense amplifier for a 128-bit folded bit-line cell array with trench 
of 2.5 urn in depth and that without trench. 

Beside these major issues, formation of neat trench shape, avoiding of dislocation formation 
at the bottom of the trench, high-energy boron implantation into deeper potion of the 
substrate, uniform capacitor film deposition, polysilicon filling into the trench with 
phosphorus doping to the polysilicon, etc. should have been solved in a limited period. 

a. Oxide uniformity 

Crystallographic orientation of trench surface varies resulting in different oxidation rate. It 
was observed that oxidation rate was higher in order of (110)>polysilicon>(lll)>(100). Even 
though lower oxidation temperature gives rise to more enhanced oxidation rate (Sunami, 
1978), this phenomenon still exists at relatively higher temperature range. Thus dielectric 
breakdown voltage was lowered at the thinnest portion on the trench wall. A transmission- 
electron micrograph of an experimental result is shown in Fig. 7 in case of 1000°C dry 
oxidation. 

A drastic solution to overcome this problem, it is desirable to utilize chemical-vapor 
depotion, CVD. An Si02/Si3N4/Si02film was made full use of in trial production. Thickness 
ratio should be carefully chosen in order to avoid non-volatile memory effect due to the 
existence of Si3N4/Si02 interface. 



<110> 

i <100> 



Si substrate 




Fig. 7. A transmission-electron micrograph of oxide thickness variation on trench wall with 
1000°C dry oxidation. 
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b. Trench to trench leakage 

Another serious problem against further scaling was a leakage current flowing through the 
deeper portion of adjacent two trenches. The current gradually fills up an empty cell with 
charges turning "1" into "0". This is a fatal failure for DRAM. This is attributed to a parasitic 
MOS transistor formed spreading over ajacent two memory cells. Two trenches work as 
deep source and drain; capacitor plate is the gate; and thick field oxide is the gate oxide. 
This is regarded as a typical MOS transistor simply causing large punch through current at 
deeper portion between source and drain. 

To outline the leakage current qualitatively, two-dimensional device simulation using 
CADDET (Toyabe, 1978) was carried out (Sunami et al., 1985). Resultant potential 
distribution with leakage current flow and a method of leakage current suppression are 
shown in Fig. 8 (a) and (b), respectively. In the simulation result, one notable fact is that the 
current flows in the deeper portion of the substrate and a potential mound is located at the 
substrate surface. These results may be attributable to a field implantation of which peak 
concentration exists at the surface. 





2 5 I". 
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Fig. 8. Leakage current characteristics for two ajacent trenches. Resultant equipotential 
curves are denoted by solid lines and broken curves are leakage current paths, in (a). A 
method of leakage suppression with p-type well is shown in (b) using ion implantation with 
boron. 

It is well known that punchthrough stopper with relatively higher dose of p-type dopant 
can suppress the punchthrough current. As is previously shown in Fig. 8 (b), the leakage 
current decreases inversely with the increase in a boron implantation dose. Since the 
impurity concentration of the substrate is 1.5xl0 15 cm" 3 , boron implantation doses of 1, 5, 7, 
and 10xl0 n cm" 2 generate accepter concentrations of 1.5, 1.9, 2.1, and 2.5xl0 15 cm" 3 at 3-um 
deep portion between two adjacent trenches. It is noted that even small increase in the 
concentration can drastically reduce leakage current. 
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One of radical solutions is to provide a storage node being isolated from the current path in 
the substrate. Substrate plate or sheeth plate configurations are good candidates which will 
be referred to hereafter. 

c. Soft-error 

In the final stage of the development, most serious problem of soft-error was found caused 
by the alpha-particle hit as shown in Fig. 9. A difference of few orders of magnitude was 
observed between the planar and the trench cells at cell-failure mode. While, the same 
performance was observed for both of them in bit-line failure mode. This is because the bit- 
lines were formed in the same configuration. In this result, it was observed that the trench 
cell with 40% increase in signal charges provided the same soft-error rate as compared to the 
planar cell. Even though the trench cell provides stable DRAM operation due to larger 
signal charges, it loses the advantage of increased signal charges. 
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Fig. 9. Measured soft error rates of planar and trench cells. 

One alpha particle at maximum 5-MeV energy generates almost one million electron-hole 
pairs. One million electrons is about 190 fC which is almost equivalent to signal charges 
stored in one storage capacitor of 1-Mbit DRAM cell. Due to extended depletion layer of the 
storage capacitor in the trench cell, it "effectively" collects generated electrons, as shown in 
Fig. 10. 

In addition to the soft-error problem, it was predicted that punch-through current between 
any adjacent two capacitors would soon limit further shrinkage of the cell. That was a 
serious decision point about whether the trench cell should be improved or abandoned. 
In those days, most DRAM manufacturers made efforts to supply their DRAM products to 
very limited leading mainframe makers. That was a kind of certificate that their products 
achieved first-grade reliability. The certificate surely made their business fruitful. Even with 
a half-year delay in product shipment, they might lose their business in the mainframe 
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Fig. 10. A model of electron-hole pair generation by an alpha-particle hit. 

market during one DRAM generation. There is a clear evidence that a leading maker has 
changed in each DRAM generation, Intel at 1 K, then, TI, MOSTEK, Hitachi, NEC, Toshiba, 
Samsung ... 

Since Hitachi has been a DRAM manufacturer as well as mainframe supplier, it focused 
keenly on the mainframe application with highest-grade reliability compared to those of 
personal-use electronic appliances with relatively low reliability. Thus, Hitachi had 
abandoned the trench cell putting aside several ideas already proposed by the device 
development group for improved structures to reduce the soft-error problem (Sunami, 2008- 
a). Additional development was thought to need more than half a year. Since leading 
mainframe makers accept only a few DRAM suppliers, new product shipment with half- 
year delay would be fatal. 

In the same period, a new configuration of array operation with half -Vcc plate (Kumanoya et 
al., 1985) was proposed as shown in Fig. 11. This has a strong potential of storage- 
capacitance doubling. As a result, it could prolong the use of conventional planar cell. This 
proposal had also influenced Hitachi's decision that conventional planar cell should be 
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Fig. 11. Half- Vcc plate configuration has a possibility of doubling signal charges keeping 
maximum electric filed strength applied to capacitor insulator. 



316 



Advances in Solid State Circuits Technologies 



applied to 1-Mbit DRAM products. The conventional structure with conventional fabrication 
technologies are strongly desirable from manufacturability and cost points of view in 
general. 

While, several innovative trench cells had been proposed and then developed to drastically 
improve soft-error problem in the same development project. A prime key fator was to 
avoid inflow of charges generated by alpha hit. Those cells were substrate-plate cell (Sunami 
et al., 1982-a) and sheath-plate cell (Kaga et al., 1988) as shown in Fig. 12. Storage nodes of 
these cells are surrounded by capacitor insulator being isolated from charges generated in 
the substrate by an alpha-particel hit. A portion of p-n juction exposed to generated charges 
is very small clearly illustrated in the figure. The substrate-plate trench cell amazingly 
improves soft-error tolerance due to its highly shrunk depletion layer. 



p-ch. cell 
transistor 



Storage node 



n-ch. cell 
transistor 




Storage node 




p-substrate 



(a) Substrate plate (b) Sheeth plate 

Fig. 12. Proposed DRAM cells to drastically improve soft-error caused by alpha-particle hit. 
Storage nodes are isolated from substrate by capacitor insulator 

Despite alpha-immunity problem, several major manufacturers employed the trench and 
have been improving the structure until today. Together with the trench, the stacked 
capacitor cell was also applied in products. In addition to these cell structure innovations, 
the hemi-spherical grain (HSG) structure (Watanabe et al., 1992) was an inevitable technique 
to double the storage capacitance due to increased surface area. 



3.5 DRAM cell trend 

Major advancement in cell innovation is shown in Fig. 13. Cylinder-type stack and 

substrate-plate trench, both with HSG, are the major cells being manufactured today. These 

DRAM cell innodations are divided into three phases. 

Phase I (IK— >1M): Shrinkage of planar area of memory cell together with the decrease in 
capacitor insulator thickness. Thinning of the insulator finally brought about 
catastrophic dielectric breakdown of the insulator. Even with utilizing of half-Vcc 
configuration, planar cell could not survive at 4-Mbit era. 
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Phase II (1M— >1G): 3-D capacitor structure with planar cell transistor. The capacitance does 
not suffer from planar area shrinkage in principle. Two categories of stack and 
trench capacitor cells were proposed. In the latter part of the phase II, high-k 
materials became inevitable to keep capacitance value against cell area shrinkage. 

Phase III ((1G— >1T): Three-dimensional stack of the capacitor and the cell transistor. This 
will be described later in section 4.5. 



3 transistors (1K) 




Substrale plals 
+ HSG 



PHASE-! 



PHASE-III 



Fig. 13. DRAM cell trend. Phases I, II, and III correspond to planar area shrinkage, 3-D 
capacitor, and 3-D stack of cell transistor and storage capacitor, respectively. 

A typical memory cell of commercially available 1-Gbit level DRAM is shown in Fig. 14 
(Sunami, 2008-c). This shows one kind of combination to utilize various technologies. This 
virtual structure is not necessarily the exact one of commercially available real product. 
Extended channel length with trench gate is aiming much less sub-threshold current to keep 
sufficient refresh time. Relatively low concentration of n-type dopant at junction also 
provides lower leakage current due to reduced electric filed across the junction. Since it is 
predicted that there will certainly exist an ultimate limit in size of hemi-spherical grain, 
diameter of the cylinder will also cease to shrink due to the grain size. 



3.6 Material revolution 

From 1 K to 1 M, size scaling was the key issue. The storage capacitance value was kept 
almost the same over several DRAM generations by reducing insulator thickness 
compensating memory cell shrinkage. Consequently, the reduced thickness made the 
electric field across the insulator close to 5 MV/cm which was recognized to be the upper 
limit for keeping insulator integrity and refresh time in DRAM operation. Thus, innovative 
techniques other than thickness reduction were strongly required. 

In response, three-dimensional structures were proposed. From 1 M to 1 G, three- 
dimensional structure innovation has been achieved as previously shown in Fig. 13. 
However, as the aspect ratio of the storage capacitor exceeds more than 10, 
manufacturability becomes a much more serious issue. The final parameter to be handled in 
the relation expressed in Eq. (1) is permittivity, 6j. Thus, various kinds of high-fc materials 
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Fig. 14. A typical 1-Gbit level DRAM cell utilizing various kinds of proposed technologies. 
This may not necessarily be the exact memory cell in commercially available products. 

have been developed as shown in Fig. 15. But there is a serious fact that the thinner the 
thickness is, the less its permittivity is. An empirical equation regarding the relation 
between leakage current, 7i ea k and barrier height in silicon-insulator system is expressed as 



Iieak°c exp[-(m^ 1 / 2 T], 



(2) 



where, in, (/>, and T are effective mass, barrier height, and film thickness, respectively. Thus, 
high-k film may not be a unique ultimate solution at this moment. Material revolution with 
ultra high-/c material is solicited to extend DRAM further toward terabit DRAM on a chip. 
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To summarize innovation achieved in the past and requirements to the future, there are 
three eras for DRAM development. 

• 1 K to 1 M — dimension improvement: smaller cell and reduced insulator thickness. 

• 1 M to 1 G — structure or material innovation: stack or trench cell with high-k film. 

• 1 G to 1 T — 3-D stack: cell transistor and storage capacitor with material revolution. 
The final parameter which affects advanced shrinkage of the cell should be the insulator 
thickness itself. If the insulator is thick enough to fill the internal hole of the trench of the 
trench cell or cylinder of the stacked cell, the plate of the capacitor cannot penetrate inside 
the trench or the cylinder, resulting in no capacitor formation (Itoh et al., 1998), as shown in 
Fig. 16. In this sense, high-fc films should be thin enough, simultaneously keeping their high- 
k value. This may be the deadlock for realizing smaller cells of the 1-transistor type DRAM 
cell. Even utilizing cutting-edge high-fc films at present, 32 or 64-Gbit DRAM will be the 
biggest capacity on a chip without chip stack. We would like to expect novel main memory 
candidates in near future. 




(a) Cylinder 
2F=2T+2T S +T P +T G 



(b) Pillar 
2F=2T+T S *T P 



Fig. 16. Relations among several film elements constructing the storage capacitor. 

4. Two- and three-dimensional MOS transistors 

Since integrated circuits, particularly MOS memory and processor, were introduced to the 
market in early 1970's, almost four-fold increase in both memory's volume and processor's 
performance has been continually achieved every three years, as previously shown in Fig. 1. 
The strongest driving force for the increase is undoubtedly "cost" as previously described in 
section 3.2. The volume increase has been attained maily by shrinkage of all components on 
a chip. MOSFET (field-effect transistor) is particularly suitable to the shrinkage because the 
scaled transistor provides better performance. This transistor's behavior was theoretically 
analysed (Dennard et al., 1974) and named "scaling principle" later in semiconductor 
industry. 



4.1 Innovation of 2-D transistors 

Even though scaled transistor provides better performance, various kinds of problems 
become more serious in response to the scaling. They are so-called "short channel effects"; 
drain-to-source breakdown voltage is decreased; hot-carrier immunity gets worse; 
subthreshold current becomes more harmful against cut-off performance; gate leakage 
current increases with decreasing of gate oxide thickness; and mobility degradation 
sacrifices the scaling itself . 
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To cope with these short channel effects, 2-D transistor structure has been improved, as 
shown in Fig. 17. The structure has been improved so that electric field in the vicinity of 
drain is reduced. High electric field results increased leakage current and reduced 
breakdown voltage of source to drain. Thus DD was developed to reduce the electric filed 
with more graded impurity profile around n + drain. However, the graded impurity profile 
increases punch-through current in deep portion between source and drain. 
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Fig. 17. Improvement of MOS transistor structure regarding source and drain regions. SD, 
DD, LDD, HDD, and SOI denote single drain, double drain, lightly-doped drain, highly- 
doped drain, and silicon-on-insulator, respectively. 

Then, LDD was developed so as to suppress the punch-through current with graded 
impurity profile regions which were located only at edges of drain and source, as shown in 
Fig. 17. Due to relatively higher resistivity associated with the graded impurity profile, 
LDD's drivability was not satisfactory because of relatively higher series resistance between 
source and drain. Then HDD was developed to reduce the effect. 

Even though these innovations were made, mobility degradation problem still remained. 
Based on a physical aspect that tensile and compressive strains enhance the electron and the 
hole mobilities respectively, a strained MOS transistor was proposed (Kesan et al., 1991; 
Ismail, 1995). Typical strained silicon MOS transistors are shown in Fig. 18. 
The strained transistor (a) in Fig. 18 consists of SOI structure with a Si-Ge layer underneath 
source and drain. Since an overlayer silicon has to be epitaxially deposited on Si-Ge layer, 
complicated fabrication processes are likely to delay the practical use of it. 
As a more practical structure, the usage of compressive and tensile chemical-vapor 
deposited (CVD) silicon-nitride films was proposed (Pidin et al., 2004), as shown (b) in Fig. 
18 . Stresses of about -2 and +2 GPa were successfully introduced into p- and n-channel 
regions, respectively. Minus and plus signs of stress denote compressive and tensile, 
respectively. Even though real deposition methods of the films were not disclosed in the 
meeting, it is well presumed that the tensile strain may be introduced by thermally 
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Fig. 18. Typical strained silicon MOSFET's: SiGe buried layer, (a) and CVD SiN cap films, 
(b). 

decomposited CVD whereas the compressive strain may be given by plasma-enhanced 
CVD. Almost 50% increase in carrier mobilities of both n- and p-channel transistors were 
obtained. 



4.2 Proposals of quasi 3-D transistors 

To cope with short-channel effects which will be more and more serious in response to the 
scaling of conventional 2-D transistors, transistors of which channel was formed on both 
side walls of a silicon beam, named trench-isolated transistor using side-wall gates, TIS 
(Hieda et al., 1987) and fully depleted lean-channel transistor, DELTA (Hisamoto et al., 
1989) were proposed as shown in Fig. 19 (a) and (b), respectively. Because of horizontal 
current flow of the transistor, this kind of transistors is called "quasi 3-D" in this article. 
In TIS, full side walls were not used, while main channel was formed on side walls of the 
thin silicon beam in DELTA. The bottom of the silicon beam is fully oxidized with local- 
oxidation of silicon process (LOCOS), the beam is isolated from silicon substrate like SOI 
substrate. Advantages of the thin silicon channel were estimated. 
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Polysilicon 
trench isolation 



.Source 




(a) Trench isolated transistor using 
side-wall gates (TIS) 



(b) Fully depleted lean-channel 
transistor (DELTA) 



Fig. 19. Proposed quasi 3-D transistors of trench-isolated transistor using side-wall gates 
(TIS), (a) and fully depleted lean-channel transistor (DELTA), (b) 
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The author's group has proposed several devices with respect to quasi-3-D structures. One 
of them is corrugated channel transistor, CCT (Furukawa et al, 2003; Sunami et al, 2004) as 
shown in Fig. 20. Plural beam channels with {111} surface are formed by a 
crystallographically preferential etching with tetramethylammonium hydroxide, TMAH, 
atomically flat channel surface can be formed expecting less mobility degradation by 
avoiding rough surface of the channel. 

The current drivability of CCT is proportional to the number of the beams as shown in Fig. 
21. This is suitable for area-conscious applications such as power transistor and/ or high- 
voltage transistor. 



f„: Beam thickness 



W B : Number of beams 
■3 in this device) 




Comb-shaped 
multi Si fins 



Fig. 20. A corrugated-channel transistor, CCT featuring. 

Other proposal is super self-aligned triple gate transistor (Okuyama et al., 2007) as shown in 
Fig. 22. As two sidewall gates are delineated with an etching mask of a top gate, triple gates 
are selg-aligned each other leading to much smaller area occupation on a silicon die. One of 
device performance is shown in Fig. 23. Three gates operate three transistors independently 
with unified source and drain. At single-gate operation, subthreshold current can be 
controlled by other two side gates, namely, a variable threshold-voltage transistor can be 
realized in a certain voltage range. 




100 150 200 

Planar area ((im ! ) 
Fig. 21. Drivability of corrugated-channel transistor, CCT in terms of planaer area. 
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Fig. 22. Super self -aligned triple gate transistor featuring three gates of top gate, side gate-1, 
and side gate-2 formed in self -aligned manner. 
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Fig. 23. Drain current characteristics of the triple gate transistor. Three gates provide 
independent three transistors with a unified drain and a unified source. 

In these quasi-2-D transistors, there exist several serious issues caused by the formation of 
tall and thin steep silicon beam. They are (1) delineation of steep vertical silicon beam, (2) 
conformal gate material formation, (3) low-resistive source and drain, and (4) low resistive 
contacts to source and drain. The former two can be solved by advanced lithography with 
multi-level resist technique, CVD, and dry etching with high material selectivity. The latter 
two may be achieved by silicidation of silicon beam and wrapped metal contact as shown in 
Fig. 24. 

In the figure, current paths of beam channel transistor are illustrated. It is obvious that 
longer current paths in relatively high resistivity area are illustrated in top contact as shown 
in Fig. 24 (a). On the other hand, relatively shorter current paths are formed in wrapped 
contact as shown in Fig. 24 (b) 

Simulated drain currents and transconductances are described in Fig. 25 in case of typical 
impurity concentration and silicidation (Matsumura et al., 2007). Top contact transistor 
structure scrifices the advantage of beam-channel transistor to a considerable extent. 
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Fig. 24. One of drain current characteristics of the triple gate transistor at two modes of gate 
voltage application. 
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Fig. 25. Simulated drain current and transconductance of transistors with top contact and 
wrapped contact. Transistor structures are shown in Fig. 24. 



4.3 Proposal of 3-D transistors 

To summarize quasi-3-D transistors described above, a possible scenario of transistor 
structure innovation is illustrated in Fig. 26. Transistors with horizontal current flow inside 
a silicon beam are called FINFET today (Choi et al., 2001). Then, 3-D FET's with vertical 
current flow will be a next candidate for 3-D LSI. 

With respect to the vertical transistor, a few DRAM cells utilizing vertical current flow 
structure have already been proposed in mid 1980's. They are trench-transistor cell, TTC 
(Richardson et al, 1985) and surrounding gate transistor, SGT (Takato et al., 1988). However, 
they are not manufactured in real products yet. One reason is probably that fabrication 
technologies do not become matured yet in general. 
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(a) Partially-depleted 
SOI FET 



(b) Fully-depleted 
SOI FET 



(c) FIN FET 



<d) Vertical FET 



Fig. 26. Recent trend in transistor structure. It is not reported yet in 2009 that both FINFET or 
vertical FET is already shipped to the semiconductor market. 

These structures may be almost the tiniest configuration in one-transistor DRAM cell. A 
theoretical area of these cells is 4F 2 . F is a feature size of device, in other word, technology 
node itself. In conventional array configurations, theoretical memory cell sizes of open bit- 
line and folded bit-line arrangements are 6F 2 and 8F 2 , respectively. A vertical stack cell as 
shown in Fig. 27 (c) will be one of the most promising structures in near future. 
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Fig. 27. Proposed vertical cell transistors applied to one-transistor DRAM cell. 



4.4 A vertical transistor having a potential of 2F 2 cell area 

The author's group has proposed a super pillar transistor, SPT which has a potential of 
realizing 2F 2 DRAM cell (Sugimura et al., 2008). This SPT can double the packing density of 
DRAM cell as compared to 4F 2 cells previously shown in Fig. 27. A bird's eye view of SPT is 
shown in Fig. 28. 

Fabrication process folw is as follows. Selected portions of a silicon beam are covered with 
CVD SisN4 films. Then high temperature oxidation is performed at 1000°C to the extent that 
the beam is fully oxidized. Portions which are not covered with the SisN4 films are converted 
into SiC>2 remaining physically and electrically separated silicon pillars. Subsequently, gate 
oxidation is processed and gate film is entirely deposited. Then, directional dry etching is 
performed entirely on a wafer remaining two gates located on both sides of the beam as 
residues associated with the dry etching. The resultant structure is already shown in Fig. 28. 
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Slicon pillar 
Side gale-2 



Fig. 28. A fundamental process sequence to fabricate super pillar transistor, SPT. The pillar 
is isolated with field oxide which is converted from silicon beam itself with well-known 
local oxidation of silicon, LOCOS technique. Side gate-1 and -2 are self -aligned to silicon and 
oxide beam. 

An SEM plane view of SPT is shown in Fig. 29. Even though the thickness of field Si02film 
is twice as much as that of silicon beam, removal of the Si3N4 film and scrificed oxidation 
reduce the thickness by a factor of 0.5. Thus the field oxide thickness shown in Fig. 29 is 
almost equivalent to that of silicon pillar. 




(a) A bird's eye view (b) A plane view 

Fig. 29. SEM images of a bird's eye view, (a) and a plane view, (b) of super pillar transistor, 
SPT. Field oxide is thinned by a factor of 0.5 with a controlled wet etching. 
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Fig. 30. A test circuit configuration, (a), characteristics of Id- Vd, (b) and Id-V s , (c) for super 
pillar transistor, SPT. 
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Side- wall gates on both sides of the pillar make two transistors in one pillar. Typical Id-Vg 
characteristics are shown in Fig. 30. Drain currents of Idi and Id2 denote those of two sidewall 
gate transistors. As shown in the figure, two drain currents can be controlled separately. 
With additional new technique of forming two capacitors on a pillar, two DRAM cells on a 
pillar can be obtained leading to 2F 2 cell. Consequently doubled density of DRAM can be 
realized at the same technology node. 

4.5 Prospect of vertical 3-D transistor 

Even though a lot of advantages in vertical 3-D transistor are expected compared to 2-D 
transistor, there still exists a fundamental limit due to the vertical structure. Except the 
complexity in fabrication technologies, one of the biggest problems may be practically 
unchangeable gate length. As an LSI consists of various gate lengths to optimize the 
performance such as speed/power consumption, chip size, operational margin etc., vertical 
transistors with single gate length can not be applied to LSI's of processors and ASIC's in 
particular. 

Under these circumstances, one of promising applications may be memory cell array. Cell 
transistors in a cell array should be identical in order to obtain compact array area and 
stable operation. Figure 31 proposes possible candidates of super pillar transistor, SPT to 
memory application. If a certain memory element is chosen, various kinds of memory will 
be possible. SPT can work as "a universal cell transistor" for almost all memories with one- 
transistor cell and also can be applied to static memory cell with plural transistors. 



Plate-1 
O 



Memory 
element-1 



Plate-2 
Q 



Memory element 



Memory 



Trartsistor-1 



Word-1 



H 





[Memory 
^element^ 












Cell 
transistor 


1 



Memory 
element-2 



Transistor-2 



Bit line-1 6 O O Bil 



h> 



(a) Capacitor 

(b) Phase change 

(c) Resister 

(d) Magnetic film 

(e) New material 



Cell transistor 



DRAM 
PCM 
ReRAM 
MRAM 
New memory 



Memory 



Word-2 



O O Bit line-2 

Common 
substrate 



(f) Ferroelectric 

(g) Floating gate 
(h) SiON film 

(i) New material 



FeRAM 
Flash memory 
Flash memory 
New memory 



Fig. 31. Various applications of super pillar transistor, SPT which can be operated as a 
universal cell-transistor. 

In addition to this kind of a cell transistor and a memory element stack, a transistor stack 
structure is proposed. At present, 16 stack layers of NAND flash memory, named pipe- 
shaped bit cost scalable (P-BiCS) flash memory, is proposed (Katsumata et al.; 2009), as 
shown in Fig. 32. As a silicon body of transistors is filled into a hole which is etched after 16 
gate-layer stack formation, it is no need for the formation of thin and tall silicon pillar. In 
this sense, the manufacturability of P-BiCS is expected to be more stable than that of the 
pillar type in multi-stack memory, however, it is speculated that transistor performance 
problem exists due to the polycrystalline silicon body. 
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Fig. 32. Proposed 16 layer stack of NAND flash memory named as pipe-shaped bit cost 
scalable as P-BiCS. 



5. Other approaches to 2.5-D stack LSI 

A few kinds of 3-D stack of active transistors were extensively investigated in 1980's mainly 
using laser recrystallization. But they were almost abandoned in the next decade due to 
poor integrity of overlaid single crystal layer causing much poorer productivity. In place of 
this active transistor stack, two kinds of chip-stack techniques have been developed as 
shown in Fig. 33. Flash memory and DRAM are already utilizing bonding-wire connection 
and 6 to 8 chip stack are now available in flash and DRAM products. An example on a test 
chip is shown in Fig. 34. 

Recently a through-silicon-via type connection has been extensively developed. This 
provides more flexibility of inter-chip connection and higher productivity due to the batch 
processing for via formation and inter-via contact. Nevertheless, this may not be a real 3-D 
stack, because the chip thickness measures tens of 10 urn which is much larger than the 
device arrangement pitch of tens of 100 nm. Therefore, the chip stack is called "2.5 
dimensional" in this article. 
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Fig. 33. Two kinds of chip-stack LSI's: bonding-wire connection type, (a) and through- 
silicon-via, TSV type (b). 
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Fig. 34. Eight-layered bonding-wire connection on a test substrare. 



6. Conclusion 

In response to the ceaseless requirement for extended performance of transistor in LSI, 
continual scaling has been achieved since early 1970's. Sizes of transistors in products 
measured 12 u,m in 1970 and around 45 ran in 2009. The scaling of device size has been 
brought about 4-fold increase in memory's volume and processor's performance every three 
years. Since there existed a limitation of amount of signal charges in DRAM against the cell 
size scaling, DRAM had first encountered the imitation of the volume size at 1 megabit in 
mid 1980's. To overcome the limitation, it began to employ a 3-D capacitor structure such as 
trench capacitor or stack capacitor. 

Even with the 3-D structures, its maximum volume of DRAM in a chip is estimated to be 64 
gigabit provided that the amount of signal charges stored in a cell must be kept constant 
against the cell scaling. To solve the deadlock, the employment of an extra high-fc dielectrics, 
and a vertical stack of a cell transistor with a capacitor will be inevitable in near future. 
Regarding NAND flash memory, multi-stacks of flash transistors have already been 
proposed. Since flash memory cell consists of one cell transistor in a memory cell and no 
contact is needed to source and drain in a string of cell transistors, the multi-stack is 
relatively easier than that of DRAM. 

On the other hand, field-effect transistor itself will encounter the ultimate size limit of 5-10 
nm. Only about several tens of silicon atoms exist in the channel region of 10-nm transistor. 
Normal filed-effect operation will be impossible due to fatal short-channel effects in that 
dimension range. Particularly a ratio of off current to on current becomes worse causing 
unacceptably large stand-by power consumption. 

If the scaling pace is still kept constant, the ultimate limit will be encountered within 15 
years. Forecasting the limitation, various kinds of 3-D transistors have been proposed, 
however, they will still suffer from the short-channel effects same as 2-D transistors. Due to 
a limitation of invariable channel length of vertical transistor, it will be practical in products 
that the vertical transistor is employed together with 2-D one in an LSI chip. 
To cope with these fundamental limits in miniaturization of devices, various kinds of chip 
stack will be dominant in LSI products in response to the requirement for smaller package 
used in personal-use, hand-held products. 
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1. Introduction 

Scaling of silicon dioxide dielectrics has once been viewed as an effective approach to 
enhance transistor performance in complementary metal-oxide semiconductor (C-MOS) 
technologies as predicted by Moore's law [1]. Thus, in the past few decades, reduction in the 
thickness of silicon dioxide gate dielectrics has enabled increased numbers of transistors per 
chip with enhanced circuit functionality and performance at low costs (Fig. 1). However, as 
devices approach the sub-45 nm scale, the effective oxide thickness (EOT) of the traditional 
silicon dioxide dielectrics are required to be smaller than 1 nm, which is approximately 3 
monolayers and close to the physical limit (Fig. 2), thus resulting in high gate leakage 
currents due to the obvious quantum tunneling effect at this scale (Fig. 3). To continue the 
downward scaling, dielectrics with a higher dielectric constant (high-k) are being suggested 
as a solution to achieve the same transistor performance while maintaining a relatively thick 
physical thickness [2]. Many candidates of possible high-k gate dielectrics have been 
suggested to replace Si02 and they include nitrided SiOz, Hf-based oxides, and Zr-based 
oxides. Hf-based oxides have been recently highlighted as the most suitable dielectric 
materials because of their comprehensive performance. One of the key issues concerning 
new gate dielectrics is the low crystallization temperature. Owing to this shortcoming, it is 
difficult to integrate them into traditional CMOS processes. To solve these problems, 
additional elements such as N, Si, Al, Ti, Ta and La have been incorporated into the high-k 
gate dielectrics, especially Hf-based oxides. In the following sections, the requirements of 
high-k oxides, brief history of high-k development, various candidates of high-k, and the 
latest hafnium-based high-k materials are discussed. 

2. Requirements of high-k oxides 

Among the various requirements of gate dielectric materials, the most important are good 
insulating properties and capacitance performance (Fig. 4). Because the gate dielectric 
materials constitute the interlayer in the gate stacks, they should also have the ability to 
prevent diffusion of dopants such as boron and phosphorus and have few electrical defects 
which often compromise the breakdown performance. Meanwhile, they must have good 
thermal stability, high recrystallization temperature, sound interface qualities, and so on. 
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Fig. 1. Enhanced Performance Trend as Predicted by Moore's Law. Processing power has 
steadily risen as transistors become more complex [1]. 
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Fig. 2. Feature size of transistors downscales with time and the gate oxide thickness 
decreases accordingly [1] . 



2.1 K value, band gap and band offset 

With regard to capacitance performance, the requirement is that the k value should be over 
12, preferably 25-30. An appropriate k value means that the dielectrics will have a 
reasonable physical thickness which is enough to prevent gate leakage and not too thick to 
hamper physical scaling when achieving the target EOT. On the other hand, a very large k 
value is undesirable in CMOS design because they cause unfavorable large fringing fields at 
the source and drain regions [4]. Table 1 and Fig. 5 show that the k values of some oxides 
vary inversely with the band gap, so a relatively low k value is needed [5]. There are 
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Fig. 4. Schematic drawing of a MOS stack. 

numerous oxides with extremely large k values, such as SrTiC>3, which are candidates in 
DRAM capacitors [6], but their band gap is too small. According to the required insulating 
properties, the gate dielectrics must exhibit at least the band offset of 1 eV while in contact 
with the Si substrate in order to avoid serious gate leakage and breakdown. The band offset 
is required to be over 1 eV in order to inhibit conduction by the Schottky emission of 
electrons or holes into the oxide bands [5, 7], as schematically shown in Fig. 6. This means 
that the materials must have both the conduction band offset (CB) and valence band offset 
(VB) over 1 eV. In fact, the CB offset is less than the VB offset, which suggests oxides with 
band gaps wider than 5 eV may be excluded as gate dielectrics. For those oxides with 
narrow band gaps, either the CB offsets or the VB offsets may be smaller than 1 eV, also 
limiting the choice of these materials. 
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Fig. 5. Static dielectric constant versus band gap for candidate gate oxides [5]. 



2.2 Thermal stability 

In present CMOS processes, the gate stacks must undergo rapid thermal annealing (RTA) of 
1000 °C for 5s. This requires that the gate oxides must be thermally and chemically stable 
especially with the contacting materials. Thus, group II, III, IV oxides with a higher heat of 
formation than Si0 2 such as SrO, CaO, BaO, A1 2 3 , ZrO^ HfO^ Y 2 3 , La 2 3 and lanthanides 
may be useful. Additionally, group II oxides which react with water are not favorable. 
Therefore, from the thermal stability point of view, only A1 2 3 , Zr0 2 , Hf0 2 , Y 2 3 , La 2 3 , 
Sc 2 3 and some lanthanides such as Pr 2 3 , Gd 2 3 and Lu 2 3 are left [3]. However, some 
materials with higher heat of formation than Si0 2 may also be slightly reactive with Si such 
as ZrO^ forming the silicide, ZrSi 2 [8, 9]. Among these high k dielectrics, Hf0 2 has both a 
high k value as well as chemical stability with water and Si. 
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2.3 Crystallization temperature 

Owing to the absence of grains and good diffusion barrier properties, amorphous materials 
are preferred to crystalline ones. The grains which lie in the crystalline systems can often be 
the pathways for dopants diffusion and breakdown. Unlike SiC>2, high-k oxides usually have 
low crystalline temperature and can easily crystallize when subjected to RTA. In particular, 
HfC>2 and ZrC>2 crystallize at much lower temperatures at -400 °C and ~300 °C, respectively 
(Fig. 7). According to the above factors, the approach to improve the crystallization 
temperature of Hf02 and Zr02 should be considered. The crystallized Hf02 has a much 
lower leakage current which has convinced many companies such as Intel and Freescale to 
adopt binary oxides because of their relative higher k values. 




Fig. 7. TEM image of crystallization in Hf0 2 /Si0 2 dielectrics with (a) 40% Hf0 2 and(b) 80% 
HfO 2 [10]. 



2.4 Interface quality 

The interface between the high-k dielectrics and Si substrate must have the highest electrical 
quality and flatness, absence of interface defects, and low interface state density Dj t . Bad 
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interface quality can cause high fixed charge density, inducing a large shift in the flat band 
voltage (Vfb) which severely reduces the performance and reliability of the transistor. Most 
of the high-k materials reported in this chapter have Dit ~10 n -10 12 eV/cm 2 and also exhibit a 
substantial flatband voltage shift larger than 300 mV [11]. Therefore, it is crucial to improve 
the quality of the interface. There are two ways to ensure a high quality interface, either 
using a crystalline oxide grown epitaxially on the Si or an amorphous oxide. An amorphous 
oxide has many advantages over a poly-crystalline oxide. Firstly, it is more economically 
and more compatible with existing processes. Secondly, the amorphous oxide can minimize 
the number of interface defects. Thirdly, it is possible to gradually vary the composition of 
an amorphous oxide without creating a new phase, as in silicate alloys, or when adding 
nitrogen or other metal elements. Fourthly, an amorphous oxide and its dielectric constant 
are isotropic, so that fluctuations in polarization from differently oriented oxide grains will 
not cause scattering of carriers. Finally, amorphous phases have no grain boundaries. The 
advantages of epitaxial oxides may come in the future, where their more abrupt interfaces 
allows us to reach lower EOTs. Besides the above consideration, the configuration of 
interface bonding is also significant. As the SiCh/Si interface has high quality, the ideal gate 
dielectric stack may well turn out to have an interface comprising several monolayers of Si- 
O (and possibly N) containing materials, which can be a pseudobinary layer at the channel 
interface. This layer can serve to preserve the critical, high-quality nature of the SiC>2 
interface (Dit ~2xl0 10 eV/cm 2 ) while providing a higher-k value for that thin layer. The 
same pseudobinary materials can also extend beyond the interface, or a different high-k 
material can be used on top of the interfacial layer. 



2.5 Defects 

Similar to interface defects, bulk defects formed in high-k oxides during deposition also 
causes degraded transistor performance due to the rising number of defect-related fixed 
charges. In addition, charges trapped in defects will cause a shift in the gate threshold 
voltage of the transistor, which is the key characteristic of performance. Furthermore, the 
trapped charges change with time and so the threshold voltage also shifts with time, leading 
to problems associated with negative bias temperature instability (NBTI) and positive bias 
temperature instability (PBTI). Meanwhile, trapped charges scatter carriers in the channel 
causing reduced carrier mobility. Lastly, they are the starting points for electrical failure and 
oxide breakdown. Typically, these defects are sites of excessive or deficient oxygen or 
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Fig. 8. Schematic diagram of two types of defects located (a) at the Hf02/Si02 interface and 
(b) in the bulk of HfO z film. 
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impurities. Unfortunately, most of the high-k oxides inherently have more interface defects 
in contact with the Si substrate and bulk defects than Si02 because their bonding cannot 
relax as easily [12] (Fig. 8). Nowadays, many groups are endeavoring to reduce defect 
densities by either processing control or engineering of materials. 

3. Brief history of high-k dielectric development 

To overcome gate leakage problems and extend the usefulness of SiC>2-based dielectric, 
incorporation of nitrogen into Si02 has been adopted. There are several ways to introduce 
nitrogen into SiC>2, such as post deposition annealing in nitrogen ambient and forming a 
nitride/oxide stack structure. By incorporating nitrogen into SiC>2, it not only increases the 
dielectric constant but also acts as a better barrier against boron penetration. In addition, a 
nitride/ oxide stack structure maintains the benefits of good interface quality between the 
oxide and substrate [13, 14], as schematically shown in Fig. 9. 

Despite the immense development with SiC>2, these oxynitrides still have low k values and so a 
relatively thick layer is required to prevent direct tunneling current. Therefore, alternative 
materials with a higher k than Si02 (3.9) are needed to achieve the required capacitance 
without tunneling currents [15]. Oxides of group II, III, IV such as AI2O3, Y2O3, La203, SC2O3 
and some lanthanides such as Pr203, Gd203 and LU2O3 have been proposed. Unfortunately, 
these dielectrics will only last a few generations due to limitations dictated by low power 
applications, scalability, or serious reactions with the Si substrate. Yet, these problems are 
much smaller for oxides and silicates of Hf and Zr. Thus, the choice of alternative gate 
dielectrics has been narrowed to Hf02, Zr02 and their silicates due to their excellent electrical 
properties and high thermal stability in contact with Si [16]. However, another problem, 
namely low crystallization temperature, is associated with Hf-based and Zr-based oxides. 
They can easily crystallize during standard CMOS processes. These crystalline structures can 
increase the gate leakage by orders of magnitude and provide pathways for diffusion of 
dopants and dielectric breakdown. Up to date, many groups have focused on the 
improvement of the crystallization temperature of these oxides. Thus, elements such as N, Si, 
Al, Ta and La have been incorporated into these high-k oxides. Hf-based oxides are preferred 
over Zr-based oxides for its relative higher crystalline temperature. 

• • • 




• Nitrogen atoms O Oxygen atoms 

Fig. 9. Schematic showing incoming nitrogen radicals replace oxygen atoms to form Si-N 
bonds [17]. 



340 Advances in Solid State Circuits Technologies 

4. Latest development in Hf-based high-k oxides 

4.1 Fabrication methods 

Hf-based High-k dielectric oxides have replaced conventional SiC>2 as the gate dielectric in 
sub-0.1 pm complementary metal-oxide-semiconductor devices [18, 19]. The fabrication 
technology of Hf-based high-k ultrathin dielectrics has been developed very quickly. 
Overall, the techniques can be categorized into two major approaches based on the reaction 
mechanism during preparation, namely CVD (chemical vapor deposition) and PVD 
(Physical Vapor Deposition) processes. CVD-based approaches include metal-organic 
chemical vapor deposition (MOCVD) [20], plasma-enhanced chemical vapor deposition 
(PECVD) [21], atomic-layer chemical vapor deposition (ALCVD) [22], photo-assisted CVD 
synthesis [23] and so on. These growth methods provide more flexibility and have 
relatively low cost. Among them, ALCVD is considered particularly promising, since this is 
the only feasible method to control the thickness down to the nanometer range and layer-by- 
layer composition of the metal oxide ultrathin film [24]. 

4.2 Doping of Hf-based high-k oxides 

Crystallization of pure HfC>2 occurs at only about 400-450 °C causing grain boundary 
leakage current and nonuniformity of the film thickness [25]. As a result, impurities such as 
O, B, and P can penetrate the grain boundaries during high temperature postprocessing. It 
causes equivalent oxide thickness (EOT) scaling and reliability concerns when Hf-based 
high-k ultrathin gate oxides are integrated into high temperature CMOS processes [26] . 
Recently, nitrogen incorporation has been extensively investigated in the field of high-k thin 
films [27, 28] . Nitrogen introduction into Hf O2 films has significantly improved the electric 
properties as well as crystallinity [29, 30]. On the contrary, nitrogen doping leads to 
decreased band gap. This is because it adds N 2p states which lie above the O 2p states in 
the free atoms and so the VB is raised and the CB is reduced due to the interaction between 
the nonbonding Hf 5d states and adjacent O and N states. The delocalized Hfd-Np bonding 
states contribute an indirect band gap E g of 1.8 eV, which is smaller than the Op-Hfd band 
gap of larger than 5.8 eV [31, 32]. Despite the disadvantages, the introduced nitrogen can 
suppress the growth of microstructure and interfacial layer. When N is added to Hf02, it is 
expected to distort the equilibrium of the lattice and produce disordered states. Choi et al. 
have demonstrated that adding nitrogen results in the reduction of the mobility of Hf and O 
atoms as well as increase in the nucleation temperature and consequently the crystalline 
temperature [33, 34]. All these indicate that nitrogen acts as a crystallization inhibitor and 
causes an increase in the crystallization temperature in Hf-based gate dielectrics (Fig. 10). 
The interfacial layer between the high-k dielectrics and Si substrate is one of the key factors 
determining the performance and reliability of a MOS transistor. Hence, it is extremely crucial 
to fabricate a Si02/Si like interface. From this viewpoint, a Si02 interfacial layer is often grown 
between Hf-based oxide and Si by thermal oxidation. However, this Hf02/Si02 gate dielectric 
stack usually introduces an additional EOT increase due to the low k SiO x interfacial layer. In 
order to solve this problem, addition of Si into Hf-based oxide to form Hf silicate may be a 
plausible means. Besides improvement in the interface quality, incorporation of Si into Hf- 
based oxides can also foster the formation of amorphous or near-amorphous structures [36, 
37]. A negative effect is the reduction in the k value. The k value decreases inversely with 
increasing Si concentration in Hf-based oxides. When the Si content approaches 100% 
(alternatively, Si-based oxide), the k value comes close to the lowest value of 3.9. Accordingly, 
the Si content must be selected to keep a balance between gains and defects. 
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Fig. 10. XRD spectra for the Hf0 2 and HfO x N y films: (a) as-deposited and HfO x N y films 
annealed at different temperatures and (b) as-deposited and HfCh films annealed at 
different temperatures [35] . 

HfSiON is thermally stable compared to HfCh due to the Si-N bonds that are created by the 
nitridation step, and thus HfSiON has the potential for implementation in a conventional 
gate-first process with high temperature activation annealing. By using nitrogen- 
incorporated HfSiO films, both the oxidation and reduction reactions can be suppressed in 
the annealing process at a proper partial pressure of N2 gas. The N2 gas suppresses only the 
reduction reaction, while nitrogen atoms incorporated in the dielectrics suppress both 
oxidation and reduction reactions, greatly improving the electrical characteristic of Hf -based 
high-k dielectrics [38]. Fig. 11 schematically shows the mechanism of the suppression of 
reaction and the results of suppression of interfacial layer growth can be seen in Fig. 12. 
Many groups have reported that the crystallization temperature of HfC>2 (400-450 °C) can be 
increased by incorporation of AI2O3 forming an Hf AlO alloy. Zhu et al. [39] have shown that 
Al inclusion in Hf02 significantly increases the crystallization temperature. At an Al content 
of 31.7%, the crystallization temperature is about 400-500 °C higher than that without Al. 
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Si 



Fig. 11. Schematic of the mechanism for the suppression of reaction. N2 ambient gas can 
suppress (i) SiO formation and (ii) SiO desorption. Nitrogen atoms in the dielectric film can 
suppress (iii) SiO and O diffusion [38]. 
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Fig. 12. Si02 equivalent thickness of dielectric films as a function of O2 partial pressure 
(PO2). These thicknesses were calculated from the peak-area ratio of the Si-oxide to the Si 
substrate, regarding the Si-oxide component as SiC>2 for simplicity. A straight line at around 
3.2 nm denotes the thickness of the as-grown sample [38]. 

This additional Al increases the band gap of the dielectrics from 5.8 eV for HfC>2 without Al 
to 6.5 eV for HfAlO with 45.5% Al but reduced dielectric constant from 19.6 for Hf0 2 
without Al to 7.4 for AI2O3 without Hf . Considering the factors including the crystallization 
temperature, band gap, and dielectric constant, they conclude that the optimum Al 
concentration is about 30% for conventional CMOS gate processing technology. Moon et al. 
[40] have presented the similar trend in the change of the electrical and structural properties 
due to the Al incorporation. Their results suggest that the Hf AlO film with 10% AI2O3 shows 
a great improvement in thermal stability and significant reduction of interfacial layer 
growth during subsequent thermal processes while maintaining a high k value (~19), 
leading to reduction in the leakage current by around 2 orders of magnitude compared to 
pure HfC>2. The HfAlO film also has good compatibility with the gate electrode in high 
temperature annealing process (Fig. 13). Bae et al. [41] have pointed out that while Al 
doping significantly increases the crystallization temperature in Hf02 to up to 900 °C and 
improves its thermal stability, it also introduces negative fixed oxide charges due to Al 
accumulation at the Hf AlO-Si interface, resulting in mobility degradation. The effects of Al 
concentration on the crystallization temperature, fixed oxide charge density, and mobility 
degradation in HfAlO have been characterized and correlated. In spite of these analyses, 
there are still a lot of issues to be settled in order to maximize the performance of the 
materials. 

On account of the good thermal stability and electrical characteristics, HfTaO gate dielectrics 
have attracted attention. Incorporation of Ta into Hf02 enhances the crystallization 
temperature dramatically while keeping a high k value of -17 [42]. Compared to Hf02 gate 
dielectrics, HfTaO also has the advantages of much lower charge trapping as well as BTI 
degradation and increased channel mobility [43]. Yu et al. [44] have confirmed that HfTaO 
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(a) 



(b) 



Fig. 13. XTEM images of Hf0 2 and Hf AlO after 700 °C in-situ PDA treatment. Hf AlO layer 
remains amorphous while HfC>2 is crystallized. Both films were deposited at 400 °C without 
surface nitridation [40]. 

with 43% Ta remains amorphous even after annealing at 950 °C for 30 s, and the formation 
of low-k interfacial layer is reduced (Fig. 14). The results indicate good interface properties 
between the HfTaO and Si substrate and sufficiently suppressed boron penetration behavior 
in the HfTaO film. The negligible flat-band voltage shift in HfTaO with 43% Ta film is 
observed and attributed to its amorphous structure after device fabrication. It also 
contributes to the improvement in performance and reliability of the devices. Zhang et al. 
[45] have found that HfTaO with 40% Ta exhibits the highest crystallization temperature of 
900 °C, while 35% and 52% HfTaO films show crystallization temperature of 800 °C (Fig. 15). 
The results demonstrate that HfTaO N-MOSFETs possess higher electron mobility than 
controlled Hf02 devices. Among them, the transistors with 40% Ta doped HfTaO as the gate 
dielectrics have the highest electron mobility (Fig. 16). 

HfO, |HfTaO with 43% Ta 
41.6A ^lfcfci^ 37 - 5A 




Fig. 14. TEM images of Hf0 2 and HfTaO with 43% Ta after PDA at 700 °C for 40 s and 
activation annealing at 950 °C for 30 s. Pure HfOz film is fully crystallized whereas the 
HfTaO with 43% Ta film remains amorphous [44]. 
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Fig. 15. Crystallization temperature of HfTaO with different Ta composition measured by 
XRD with incident angle of x ray: 3° [45]. 
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Fig. 16. Effective mobility of HfTaO N-MOSFETs (a) without and (b) with D 2 annealing [45]. 
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In addition, some rare earth elements such as La can also improve the characteristics of Hf- 
based high-k dielectrics. Introduction of La203 into Hf02 causes an increase in the 
crystallization temperature (Fig. 17). Furthermore, unlike other Hf-based amorphous 
materials such as HfSiO x or Hf AlOx, the permittivity of HfLaO x still yields a high k value 
(>20) [ 46] (Fig. 18). Besides, HfLaO also has the advantages of much lower charge trapping 
as well as BTI degradation and increased channel mobility. In addition, varying the La 
concentrations in the TaN/HfLaO or HfN/HfLaO gate stack can effectively tune the metal 
work function for N-MOSFETs [43]. In the capacitance- voltage curve of metal oxide 
semiconductor capacitor, Yamamoto et al. [46] have shown that the HfLaO x dielectric film 
exhibits very small degradation in both the interface and bulk properties, as shown in Fig. 
19. A very low fixed charge density in HfLaO x films is demonstrated from a very small film 
thickness dependence on the flatband voltage in their study. 
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Fig. 17. XRD spectra of 30 nm films of (a) Hf0 2 , (b) La 2 3 , (c) 33% La-HfLaO x , and (d) 40% 
La- HfLaOx annealed at various temperatures. LUO2 and La2C>3 films crystallize under 600 
°C. On the other hand, 40% La-HfLaOx film remains amorphous after 900 °C annealing 
[46]. 

An et al. [47] have synthesized ultrathin Hf02 and HfLaO x films with La/ (Hf+La) ratios of 
42%, 57%, and 64% by an atomic layer deposition process. By measuring the leakage 
current at different temperatures, they propose that the conduction mechanism of Hf02 and 
HfLaO x films follow the Poole-Frenkel emission model under the gate injection condition. 
They have also demonstrated that the intrinsic trap energy levels are 1.42, 1.34, 1.03, and 
0.98 eV in the HfLaO x samples with La/ (Hf+La) ratios of 0%, 42%, 57%, and 64%, 
respectively, showing a decreasing behavior as the La content is increased (Fig. 20). 
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Fig. 18. Dielectric constants of HfLaO x film as a function of La concentrations. The dielectric 
constants are determined by MIM capacitors for the samples with La concentrations of 0%, 
4%, 9%, and 17%. For 20% and 40%, CET vs physical thickness plots were used [46]. 
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Fig. 19. C-V characteristics of Au/40% La-HfLaO x /p-Si MOS capacitor annealed at 600 °C. 
The film thickness was 8.4 nm. It shows very small hysteresis and frequency dispersion. 
The inset in the upper right shows the flatband voltages of Au/20% La-HfLaO/p-Si or 40% 
La-Hf LaO/ p-Si MOS [46] . 
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Fig. 20. ln(J/E) vs 1/T plots measured at various applied electric fields for (a) HfC>2 and (b) 
Hfo.36Lao.640 x films, and (c) trap energy level as a function of E 1 / 2 for both samples [47]. 

From the above results, it can be easily inferred that HfLaO x is a potential dielectric material 
for amorphous high-k gate insulator in further advanced complementary metal oxide 
semiconductor (CMOS) devices. 



5. Conclusion 

This chapter succinctly reviews the motivation to replace traditional SiC>2 gate dielectrics, 
requirements of high-k dielectrics, brief history of high-k materials development, and latest 
development in Hf -based high-k dielectrics. In order to improve the performance of CMOS 
devices, Hf-based gate layers are being integrated into MOSFETs to achieve low leakage 
current. Excellent gate transistors with improved performance based on Hf-based gate 
dielectrics as the insulating layers are expected. Although much progress has been made in 
fabricating novel gate dielectrics, investigation of these Hf-based high-k gate dielectrics 
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continues to be exciting and the final target has not yet been reached. There is still room for 
development and many issues need better understanding. 
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1. Introduction 

High-electron-mobility transistors (HEMTs) and heterojunction bipolar transistors (HBTs) 
have attracted many attentions in high speed and power applications due to the superior 
transport properties. As compared to AlGaAs pseudomorphic HEMTs (PHEMTs), InGaP- 
related devices have advantages, such as higher band gaps, higher valence-band discontinuity 
[1], negligible deep-complex (DX) centers [2], excellent etching selectivity between InGaP and 
GaAs, good thermal stabilities [3-5], higher Schottky barrier heights [3], and so on. Particularly, 
the use of an undoped InGaP insulator takes the advantages of its low DX centers and low 
reactivity with oxygen [6-10], which may still suffer from the high gate leakage issue. In order 
to inhibit the gate leakage issue, increase the power handling capabilities, and improve the 
breakdown voltages, a metal-oxide-semiconductor (MOS) structure has been widely 
investigated. However, it is still lacks a reliable native oxide film growing on InGaP, and very 
few papers have reported on InGaP/InGaAs MOS-PHEMTs. In addition, the MOS-PHEMT 
not only has the advantages of the MOS structure (e.g., lower leakage current and higher 
breakdown voltage) but also has the high-density, high-mobility 2DEG channel. 
Over the past years, a study on the liquid phase oxidation (LPO) of InGaP near room 
temperature has been done [11-14]. The application of surface passivation to improve the 
InGaP/GaAs HBTs' performance has also been first demonstrated [13]. The InGaP/GaAs 
HBTs with surface passivation by LPO exhibit significant improvement in current gain at 
low collector current regimes due to the reduction of surface recombination current, as 
compared to those without surface passivation. Moreover, a larger breakdown voltage and a 
lower base recombination current are also obtained. In this chapter, the oxide film 
composition and some issues are addressed. Then a thin InGaP native oxide film prepared 
by the LPO as the gate dielectric for InGaP/ InGaAs MOS-PHEMTs application are 
discussed, and the comparisons between devices with and without LPO passivation on the 
InGaP/GaAs HBTs are also reviewed. 

2. Characterization of the oxide film 

The root mean square (rms) value of surface roughness for the Ino.49Gao.51P sample is 
estimated to be 1.1 nm before oxidation (i.e., as received) by AFM measurement, and can be 
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improved to 0.95 nm after oxidation (i.e., as grown), as shown in Fig. 1. Fig. 2 shows the 
SIMS depth profiles before and after liquid phase oxidation on Ino.49Gao.51P. Although LPO 
on InGaP material has a much slower oxidation rate which is less than 10 nm/h, as 
comparing to that of the GaAs material, however, it is still feasible to grow a thin oxide film 
without pH control [15, 16]. The oxidation rate becomes significantly saturated when the 
oxidation time is longer than an hour, which is measured using a Veeco Instrument 
DEKTAK and confirmed by SEM. 

The XPS depth profiles of the LPO-grown oxide for Ino.49Gao.51P are shown in Fig. 3(a). Fig. 
3(b)-(d) show the XPS surface spectra of the Ga-3d, In-3d, and P-2p core levels, respectively. 
The binding energies for all spectra are calibrated with the reference (as-received) signal. 
The as-received sample was dipped into a solution of HF:H20 = 1:200 for 30 s before 
measurement. From Fig. 3(c)-(d), in comparison with the previous paper [17], the spectrum 
is rather similar to that of InP04. This is also confirmed by the values of the O-ls peak 
energy and energy separations between the main core levels (i.e., Ga-3d, In-3d, and P-2p) in 
the oxide phases [18]. This clearly suggests that the oxide film is mostly composed of InPGv 
like and Ga oxide. In addition, the oxide film may appear to be etched back in the growth 
solution after 2 h of oxidation. The thermal stability of the oxide layer is also important in 
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Fig. 1. AFM images of the Ino.49Gao.51P sample before (i.e., as received) and after (i.e., as 
grown) liquid phase oxidation. 
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Fig. 2. SIMS depth profiles of the Ino ^Gao 51P sample before (i.e., as received) and after (i.e., 
as grown) liquid phase oxidation. 

device fabrications because high-temperature processes are usually required. Again, XPS is 
utilized to also important in device fabrications because high-temperature processes are 
usually required. Again, XPS is utilized to analyze the surface chemistry of the oxide films, 
as shown in Fig. 3. After 2 h of oxidation, the RTA processes were performed in a furnace 
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with N2 flowing at 300-700 °C for 1 min [13]; however, a peak of InPGvlike is still observed. 
InP04 (bandgap energy = 4.5 eV) is chemically stable and has rather good dielectric 
properties [19]. As a result, the I11PO4 probably acts as a capping layer for the entire oxide 
film to enhance the thermal stability. However, the experimental results show that high- 
temperature treatments (700 °C) will change the properties of Ga2C>3, since the XPS energy 
peak of Ga203 shifts to a lower binding energy, and the binding energy is inferred to form 
the GaO x or Ga20 x . 
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Fig. 3. (a) The XPS depth profiles of the as-grown oxide film on Ino.49Gao.51P. 

The (b)-(d) show the XPS surface spectra for the Ga-3d, In-3d, and P-2p core levels, 

respectively. 



3. InGaP/lnGaAs MOS-PHEMT 

3.1 Experimental 

Figure 4 schematically shows the PHEMT structure grown by the metallorganic chemical 
vapor deposition (MOCVD) on a semi-insulating GaAs substrate. Hall measurement 
indicates that the electron mobility is 4000 cm 2 /Vs / and the electron sheet density is 
2.2xl0 12 cm -2 at room temperature [11]. The device isolation was accomplished by mesa wet 
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etching down to the buffer layer. The ohmic contacts of the Au/Ge/Ni metal were 
deposited by evaporation and then were patterned by lift-off processes, followed by RTA. 
The depth of gate recess is 110 run for reference PHEMT and 100 nm for MOS-PHEMT. 
After etching the capping layer and the partial Schottky layer, an LPO growth solution was 
used to generate the gate oxide for the MOS-PHEMT at 50 °C for 30 min. Finally, the gate 
electrode was formed with Au. Moreover, the oxide layer, as illustrated in the figure, also 
selectively and simultaneously passivated the isolated surface sidewall. The gate dimension 
is 2x100 urn 2 with a drain-to-source spacing of 5 um. 
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Fig. 4. The schematic drawing of the InGaP/InGaAs MOS-PHEMT. 



3.2 Results and discussion 

Figure 5(a) compares the measured I-V characteristics of the MOS-PHEMT with those of the 
reference PHEMT fabricated under identical conditions. Clearly, good pinch-off and 
saturation current characteristics are obtained. Due to the higher energy barriers between 
the metal gate and the Schottky layer, the MOS-PHEMT can be operated at higher gate-to- 
source voltage (Vgs) and drain-to-source voltage (Vds) than those of the conventional 
Schottky gate PHEMT, which can enhance the current driving capability. Fig. 5(b) compares 
the transconductance g m and the drain current density Id as a function of Vgs at Vds = 4 V of 
the MOS-PHEMTs with those of the reference PHEMT. For MOS-PHEMT, the 1.8 V-wide 
gate voltage swing (defined by 10% reduction from the maximum g m ) is higher than that of 
the PHEMT. The threshold voltage V th of MOS-PHEMT shifts to the left, which is similar to 
the result of the one with oxide deposited on the Schottky layer [20, 21]. However, the 
separation region between the oxide-InGaP interface and the InGaAs channel for MOS- 
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PHEMT is still larger than that of the reference PHEMT in this study, so the drain current 
density of the PHEMT is smaller than that of the MOS-PHEMT at the same bias V G s due to 
the decrease of the carrier concentration within the InGaAs 2DEG channel. 





450 


E 
E 
< 


400 


350 


E 






300 


>» 




+■" 




CO 

c 


250 


CD 






200 


c 




CD 






150 


= 




o 




c 


100 


ns 








Q 


50 



— 9— InGaP/lnGaAs MOS-PHEMT 
— v— InGaP/lnGaAs PHEMT 
W/L = 100nm/2nm 




Drain-to-source voltage V DS (V) 



(a) 



120 



— Q — MOS-PHEMT with shallow gate recess 
— □ — MOS-PHEMT with deeper gate recess 
— V— PHEMT 




nD apnn 



wffTffnJJ^ 



350 


. 




E 


300 


E 
< 




E 


250 


£ 




CO 


200 


c 

CD 




■a 




+-< 


150 


0) 








i. 




3 


100 


o 




c 




as 


50 


Q 



Gate-to-source voltage V GS (V) 



(b) 
Fig. 5. (a) Measured I-V characteristics of MOS-PHEMT and PHEMT. (b) The 
transconductance and the drain current density versus Vgs at Vds = 4 V for the MOS- 
PHEMT and the reference PHEMT. 
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In addition, if the depth of gate recess is etched to be 120 nm, the V t h becomes more positive, 
-0.5 V, for MOS-PHEMT with the identical processing conditions including initial pH value 
(5.0), temperature (50 °C), and oxidation time (30 min). For Vth shifts to the right, the 
separation between the oxide-InGaP layer interface and the InGaAs channel layer is 
decreased due to the consumption of the InGaP during the processes of gate recess and the 
unique properties of the LPO with the reaction of InGaP, leading to the increase of the total 
effect of the gate bias on the control of V t h. However, a decrease in the maximum gm, 63 
mS/mm, accompanies the degradation in the saturation current, 84 mA/mm at Vgs = 1 V. 
The result is also confirmed by a longer oxidation time, i.e., a thicker oxide layer. This 
drawback can be overcome by suitable device structures, such as inserting a Si-planar 
doping layer under the InGaAs channel to increase the carrier density. 

The oxide film provides an improvement in the breakdown voltage in terms of the gate 
leakage current of the MOS structure, supported by the typical gate-to-drain I-V 
characteristics, as shown in Fig. 6(a). For InGaP MOS-PHEMT, the turn-on voltage, 2.2 V, is 
obviously higher than that of InGaP PHEMT, 0.8 V, and the corresponding reverse gate-to- 
drain breakdown voltages, BVgd, are -14.1 V and -6.5 V, respectively. The turn-on voltage 
and the BVgd are defined as the voltage at which the gate current reaches 1 mA/mm. The 
gate leakage current can be suppressed at least by more than two orders of magnitude with 
an oxide film at Vgd = -4 V. The smaller gate leakage current of MOS-PHEMT is due to the 
MOS structure and the elimination of sidewall leakage paths that are directly passivated 
during the oxidation, which is consistent with the result of Fig. 5. In addition, the gate 
leakage current observed in MOS-PHEMT comes from a gate leakage path at the edge of the 
mesa [22] that is not present in the MOS capacitor, which may contribute to the Schottky- 
like I-V characteristics for forward biases. Fig. 6(b) shows the gate current density as a 
function of reverse Vgs at different Vds- Due to the high electric field existing in the gate-to- 
drain region, hot electron phenomena occur in the narrow band-gap InGaAs channel. 
Electrons can obtain higher energy to generate electron-hole pairs through the enhanced 
impact ionization, resulting in easy injection of the holes into the gate terminal [23]. 
However, in InGaP-related devices, it is more difficult for the holes generated by the impact 
ionization to overcome the valence band discontinuity and to reach the gate [4], so the bell 
shaped behavior of the impact ionization does not appear in Fig. 6. Moreover, the gate 
current density of MOS-PHEMT is significantly improved, which is less than 0.5 uA/mm, as 
compared to that of PHEMT. In other words, the electrons and holes generated by the 
impact ionization are decreased to further reduce the drain and gate currents owing to the 
oxide layer with a high barrier height. 

In order to have a better insight into the transient behavior of the studied devices, the gate 
pulse measurements were performed using a Tektronix 370 A curve tracer [24]. Vgs was 
pulsed from the V t h to V with a pulsewidth of 80 us, while Vds was swept from to 4 V. 
The comparisons between the static and pulsed I-V characteristics for PHEMT and MOS- 
PHEMT are shown in Fig. 7. The drain current of PHEMT decreased by 9.8%, while the 
MOS-PHEMT decreased by only 0.63%. To the best of our knowledge, if the pulsewidth is 
too short, electrons captured by the traps do not have enough time to be fully emitted. 
However, if the pulsewidth is long enough, all the trapped electrons are de-trapped and will 
contribute to the drain current. We believe that the differences between dc and pulsed I-V 
become evident by applying shorter voltage pulses to the gate such as less than 10-us pulses 
for PHEMT and MOS-PHEMT. Therefore, it is clear that the oxide passivation on the 
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Schottky layer can minimize the effect of surface traps, which is consistent with the lower 
gate leakage current in Fig. 6. 
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Fig. 6. (a) The typical Ig-Vgd characteristics of PHEMT with and without an oxide film, (b) 
The gate current density versus reverse Vgs at different Vds- 
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Fig. 7. Gate pulse measurements for (a) reference PHEMT and (b) MOS-PHEMT with Vgs 
pulsed from Vth to V with a pulsewidth of 80 us, while Vds was swept from to 4 V. 



4. InGaP/GaAs HBT with LPO passivation 

4.1 Experimental 

The structure used for HBT is given in Table 1. The epilayers were grown by a low-pressure 
MOCVD system on an (lOO)-oriented semi-insulating (S.I.) GaAs substrate. For InGaP/ GaAs 
HBTs, device fabrication began with emitter definition. The emitter cap layer was removed 
and stopped at the InGaP active layer. After removing the InGaP layer, a growth solution 
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was used to form the base oxide (passivation) on the exposed extrinsic surface of base and 
the base contact was then deposited. Finally, the mesa of base was defined and etched to 
sub-collector before the collector contact deposition. HiPO^based etchant was used for 
GaAs and InGaP. The Au/Ge and Au/Be metals were deposited by evaporation and 
patterned by lift-off processing to form emitter, base and collector regions, respectively. 
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Table 1. The epitaxial structure of InGaP/GaAs HBT. 



4.2 Results and discussion 

Figure 8 shows the common-emitter I-V characteristics of the HBT with and without surface 
passivation by LPO. Clearly, the dc current gain (P) of HBTs with passivation is improved 
(increased) 15% when comparing to HBTs without passivation. The higher p with surface 
passivation is due to the reduction of the surface recombination current in the exposed 
extrinsic base regions by LPO method. The common-emitter I-V characteristics of the 
devices with and without surface passivation at low collector current regimes are shown 
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Fig. 8. Common-emitter I-V characteristics of the HBTs with and without LPO passivation. 
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in Fig. 9. The devices with surface passivation have higher common-emitter p than those 
devices without passivation, due to the reduction of the surface combination velocity by 
using an oxide layer on the base surface. In addition, the P values with and without 
passivation are 13.3 and 2 at Ib = 900 pA, respectively. The maximum increase of 7 fold in 
the current gain at collector current down to nA level. 
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Fig. 9. Common-emitter I-V characteristics of the HBTs (a) without and (b) with LPO 
passivation at low collector current regimes. 

Figure 10 illustrates the measured Gummel plots of the devices with and without LPO 
passivation. The collector currents are almost identical without being affected by the 
passivation treatment. However, a decrease of the base leakage current at low collector 
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current levels is obviously observed after oxidation. Moreover, it is found that the 
recombination current at the extrinsic base region and the base-emitter perimeter are 
competed against one another, resulting in current reduction at lower base-emitter bias Vbe 
= 0.4 V. The increasing p is owing to the reduction of the surface recombination current. It 
can also be indicated that the device with pasivation exhibits higher p than that without 
passivation at lower Vbe bias. The comparison of p versus the collector current is shown in 
Fig. 11. The collector-base bias is maintained at V. Clearly, the device with LPO 
passivation shows wider collector regimes from 10" 10 A to 0.1 A. And the maximum shift of 
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Fig. 10. Typical Gummel plots of InGaP/GaAs HBTs with and without LPO passivation. 
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5 fold in the current gain from collector current of 8.1 xlO -10 A to 1.6 X 10~ 10 A can be achieved. 
This is attributed to the surface state density are suppressed, i.e., the surface recombination 
current is effectively reduced. The inset shows the base-collector junction current against 
bias voltage for the devices with and without passivation. For the device with passivation, 
the breakdown voltage (23.5 V) is higher than that (21.9 V) without passivation at I = 50 uA. 
The smaller leakage current is owing to the reduction of the surface recombination by the 
native oxide passivation in the base region. Above results clearly indicate that the p at low 
(medium) collector current regimes and the breakdown voltage will be increased. 
Additionally, the base current is decreased for the devices with passivation when comparing 
to those without passivation, which will be beneficial to low-power electronics and 
communication applications. 

5. Conclusion 

The InGaP/InGaAs/GaAs MOS-PHEMT with the Ino.49Gao.51P oxide as the gate insulator 
prepared by LPO has been demonstrated. As compared to the counterpart of the 
conventional InGaP PHEMT, the proposed InGaP MOS-PHEMT can further reduce the gate 
leakage current at least by two orders of magnitude, increase the breakdown voltage by 
200%, and enhance the gate voltage swing. Also, the pulse transient measurement shows 
much less impact of the surface trap effects for the InGaP MOS-PHEMT. In addition, as 
compared to the conventional InGaP/ Ga As HBTs without surface passivation, the HBTs 
with LPO passivation possess the characteristics of lower surface recombination currents, 
higher breakdown voltage and improved higher dc current gain. The HBTs with LPO 
passivation exhibit 700% improvement in current gain at low collector current regimes by 
the reduction of surface recombination current, as compared to those without passivation. 
Therefore, the proposed low-temperature and low-cost LPO can easily be implemented and 
can provide new opportunities in device applications. 
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1. Introduction 

Due to the development of higher integrity electronic devices, it is required to improve the 
quality of Czochralski (CZ) silicon. On one hand, voids at the near-surface of wafers 
degrade gate oxide integration (GOI) of MOS devices and therefore reduce the yield of 
devices. On the other hand, it is a trend for the oxygen concentration of CZ silicon used for 
ultra large scale integrated circuits (ULSI) to become lower, so it will be difficult to form 
oxygen precipitates and create gettering sites in the bulk for undesirable metallic 
contaminants on silicon wafers. In addition, with increasing the diameter of wafers, the 
dislocations due to higher thermal stress and gravitational stress will generate easily, 
therefore it is desirable to enhance the mechanical property of wafers. 

As an important consist for the novel "impurity engineering" for CZ silicon materials (Chen 
et al., 2010; Chen & Yang, 2009; Yang et al., 2009), the behaviors of germanium in CZ silicon 
have attracted considerable attention in recent years, which was invented by our group. 
Compared to normal dopant elements, germanium doping will not induce electrical centers 
such as shallow thermal donors due to its equivalent electrons with silicon. Furthermore, the 
solubility of germanium in silicon is so large that germanium doping will not have influence 
on the growth of CZ silicon, if germanium concentration is lower than 10 19 cm -3 . And, it is 
believed that germanium doping in CZ silicon could be much easier to control, so that the 
influences of germanium doping to the properties of CZ silicon wafers could be adjusted". 
Recently, we have investigated the effect of germanium with concentration of 10 ls -10 19 cm -3 
on the mechanical stress, the formation of oxygen-related donors, oxygen precipitation and 
void defects in CZ silicon materials. It has been established that the mechanical strength of 
silicon wafers could be improved by germanium doping, which benefits the improved 
production yield of wafers (Chen et al., 2008). It is also found that germanium suppresses 
thermal donors (TDs) and new donors (NDs), which benefits the stable electrical property of 
wafers (Cui et al., 2006; Li et al., 2004b). More importantly, germanium has been found to 
suppress the formation of crystal originated particles (COPs) related to void defects, which 
can be annihilated easily during high temperature treatments (Chen et al., 2007a; Yang et al., 
2002). Meanwhile, the enhancement of oxygen precipitation can be obtained by germanium 
doping ( Chen et al, 2009; Chen et al, 2006a; Chen et al., 2006b; Li et al., 2004a), and 
therefore internal gettering (IG) capability could be improved (Chen et al., 2007b; Chen et 
al., 2007c). Up to now, ascribing to the novel properties induced from germanium atoms, it 
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is considered that germanium doped CZ (GCZ) silicon could probably become one of the 
new type silicon materials met requirements of higher performance ULSI. 
In this chapter, the behaviors of germanium doping CZ silicon will be reviewed mainly 
based on our recent work, and two preliminary applications of GCZ silicon wafers will be 
shown as examples. 

2. Mechanical strength 

By alloying with oxygen and some dopants, such as nitrogen, the mechanical strength of 
silicon single crystals could be increased. The strengthening is believed to be associated with 
impurity concentrations and dislocation densities. Like that for nitrogen-doped float zone 
(FZ) silicon, it shows a much higher yield strength than usual FZ silicon because nitrogen 
atoms bring about the hardening of silicon crystals through locking of dislocations upon 
congregating on the latter one (Kishino et al., 1982; Yonenaga, 2005). As a similar story, the 
mechanical strength improvement of silicon crystals doped with germanium is considered 
to be effective at immobilizing and retarding the velocity of dislocations while germanium 
doping level excessed 6xl0 19 cnv 3 (Fukuda & Ohsawa, 1992). Furthermore, dislocation-free 
CZ silicon crystal could be obtained using a heavily germanium doped seed without Dash 
necking(Huang et al, 2003). Recently, we emphasize that the lightly germanium doping 
benefits the mechanical stress improvements for CZ silicon wafers. 

Table 1 lists the statistical Total Thickness Variation (TTV), Warp and Bow data from 100 
pieces of the as-processed wafer during a mass production for both the CZ and GCZ silicon 
(with the germanium level of 10 18 cm -3 ) (Chen et al., 2008). Normally, Warp represents the 
total amount of maximum variations between the medium and reference surfaces of wafers, 
while Bow is defined as a half dispersion of concave and convexo maximum between the 
medium and reference wafer surfaces, both of which are believed to characterize the extent 
of warpage for silicon wafers and are controlled in production lines extensively: the smaller 
they are, the slighter the warpage would be. As can be seen in Table 1, both the Warp and 
Bow merits were relatively smaller in percentage for the GCZ silicon wafers than that for the 
CZ silicon wafers, indicating that germanium doping in silicon inclines hardly to cause 
warpage during the wafer making from monocrystalline ingots. Moreover, the fact of the 
slightly smaller data for the GCZ wafers than the CZ wafers shows that the mechanical 
strengths of CZ wafers might be improved slightly by germanium doping, which is 
coincident with the fact that a higher yield of polished wafer could be obtained for GCZ 
wafer during the assemble wafer making: the yields of polished CZ and GCZ silicon wafers 
were 89.9% and 92.8%, respectively. It is therefore concluded that a slight suppression on 
the warpage of CZ silicon wafer could be presented by light germanium doping. It is 
considered that, compared with normal CZ silicon, grown-in oxygen precipitation could be 
enhanced in GCZ silicon, which will be discussed below. Then, the enhanced grown-in 
oxygen precipitates could pin up dislocations and retard their movements, so that the 
macroscopical mechanical strength of GCZ silicon wafers could be increased. 
Herein, it is believed that the novel concept of "mechanical strength improvement by 
germanium doping" is of great merit, not limiting to the application field of IC used silicon 
wafers. Especially, it is worthwhile to point out that this novel concept could be adopted in 
improving the wafer production yield and producing super thin wafer support for solar 
cells. 
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Table 1. Data dispersion degrees of TTV, Warp and Bow for polished CZ and GCZ silicon 
wafers during the making processing, (from 300 pieces) (Chen et al., 2008) 

As a detailed clarification for the mechanical strength of as-processed silicon wafers, the 
indention tests performed at room temperature followed by a high temperature annealing, 
which is considered to be one of the popular approaches for investigating the behaviors of 
dislocations in silicon wafers (Akatsuka et al., 1997; Fukuda & Ohsawa, 1992), were also 
adopted in our investigation. Fig. 1 shows the classical optical images of the indentation (as 
indented) and the rosette pattern of punched out dislocations (PODs) introduced by 
indentations (subjected to 1100°C/2h anneal) in GCZ silicon wafer. Herein, the POD 
diffusion length stands for the capability of mechanical strength of silicon wafers. From the 
rosette sizes shown in the GCZ silicon wafers with germanium doping (from the 
concentrations from 10i 6 to 10 19 cm- 3 ) subjected to 1100°C/2h anneal (Chen et al, 2008), it 
could be found that the mechanical strength was improved by germanium doping. With the 
increase of germanium doping level, the POD diffusion length decreases, which should be 
ascribed to the intensive dislocation pin up effects by the micro-defects (such as small-sized 
oxygen precipitates). 
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Fig. 1. Optical images of (a) the indentation and (b) the rosette pattern of PODs introduced 
by indentations in the GCZ silicon wafer subjected to 1100°C/ 2h anneal; (c) Rosette size in 
the GCZ silicon wafers with different germanium doping subjected to 1100°C/ 2h anneal 
(Chen et al, 2008). 

During ULSI device fabrication, the mechanical strength during thermal processing affects 
the cracked-wafer breakdown yield and even the lithography accuracy. Considering this, 
the indentation tests on thermal treated silicon wafers have been studied via varied pre- 
annealing. Fig. 2 shows the optical images of PODs for the CZ and GCZ silicon, which were 
annealed at 800°C for 16 h or plus re-annealed at 1000°C for 4 h. Actually, the amorphous 
silicon and dislocations could be formed around the indentation positions at room 
temperature and then high stress could occur under a highly localized stress (Minowa & 
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Sumino, 1992). The amorphous silicon inclined to transform to the heavy dislocated 
crystalline silicon and the dislocations began to move so as to release the stresses when high 
temperature annealing was adopted. Herein, the travel distance of PODs in the GCZ silicon 
samples after 800°C/16h annealing was calculated to be somewhat shorter than that of the 
CZ silicon samples. Whereas, for 800°C/16h + 1000°C/4h annealing, the moving distance of 
PODs in the GCZ silicon sample seemed to be unambiguously shorter than that in the CZ 
silicon samples. And, these phenomena were consistent with the conclusions drawn from 
the fracture strength measurement (Chen et al., 2008). 





Fig. 2. Typical optical micrographs for the scratch-introduced CZ and GCZ silicon samples 
annealed at 1000°C/2.5 h. (a) CZ, 800°C/16h, (b) GCZ, 800°C/16h; (c) CZ, 800°C/16h + 
1000°C/4h; (d) GCZ, 800°C/16h + 1000°C/4h. 

Additionally, the influence of germanium doping levels in CZ silicon on the mechanical 
strength during device fabrication processing has been clarified by strain-stress checking. 
The rectangular-parallelepiped-shape samples of both the 2000 um thickness normal CZ and 
GCZ silicon (GCZ2 and GCZ3, with the germanium doping level of 10 17 and 10 18 cm- 3 , 
respectively) wafers were investigated after a pre-treated low-high temperature two-step 
thermal anneal (800°C for 16h + 1000°C for 4h). As can be seen from the typical stress-strain 
curves for both the CZ and GCZ silicon samples shown in Fig. 3, it is indicated that the 
higher content of germanium benefits the improvement of the critical fracture stress (Chen 
et al., 2008). It is considered that the strain field introduced by germanium doping might not 
directly lead to the suppression of dislocations, whereas, the germanium-doping-related 
small-sized but higher-density oxygen precipitates within the GCZ silicon can contribute to 
the excess of mechanical strengths compared to the normal CZ silicon wafers. 
It is considered that, the light germanium doping with the concentration of 10 16 -10 19 cm -3 is 
expected to introduce the compressive strain field into silicon matrix due to the larger atom 
size of germanium. The strain fields would generally give rise to the retardation of 
dislocation movements due to the potential barrier related with the interaction between the 
dislocations and matrix. However, the geometrical influences induced by light germanium 
doping are too slight to retard dislocation mobilization. Instead, it is considered that 
germanium could combine with some point defects in CZ silicon, such as vacancy and /or 
interstitial oxygen, and seeds for oxygen precipitates of smaller sizes but higher density. 
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Fig. 3. Typical stress-strain curves for the CZ (CZ) and GCZ silicon samples (GCZ2 and 
GCZ3, with the germanium concentrations of 10 17 and 10 18 cm- 3 , respectively) annealed at 
800°C/16h + 1000°C/4h. (Chen et al., 2008) 

Therefore, for both the grown-in case and the thermal treatment case, the oxygen precipitate 
nucleation at the sites of the dislocation cores could be enhanced by light germanium 
doping and the precipitates could act as the strong pinning complexes for the dislocation 
mobilization. In this viewpoint, it is reasonable to understand that the higher concentration 
of germanium atoms in CZ silicon could reduce the dislocation velocity and then decrease 
their moving distance. 



3. Oxygen-related donors 

Oxygen-related donors, including thermal donors (TDs) and new donors (NDs), which are 
believed to generate normally in the temperature ranges of 350-550°C (Fuller & Logan, 1957) 
and 600-700°C (Capper et al., 1977), respectively, can deteriorate the electrical properties of 
wafers. Impurities like germanium and nitrogen have been reported to retard TD formation 
(Hild et al., 1998). Based on the experimental facts, it is considered that germanium doping 
suppress the formation of TDs, but does not affect the microscopic structure of TDs, which 
suggested to be the result of the reaction of germanium with point defects (like silicon 
interstitial, boron, vacancy and interstitial oxygen dimer) in CZ silicon; whereas, the 
germanium doping could enhance the formation of NDs in CZ silicon, which is proposed as 
a process associated with the nucleation enhancement of oxygen precipitation by 
germanium doping. 

In this section, a conventional CZ silicon and two GCZ silicon (GCZ1 and GCZ2, with the 
germanium concentrations of 10 16 and 10 18 cnv 3 at the seed-ends, respectively) ingots were 
grown under almost the same conditions. Samples from different position of CZ and GCZ2 
silicon ingots were annealed at 650°C for 30min to annihilate as-grown TDs. The resistivity 
of the annealed samples was measured by means of four-point probe, and the TD 
concentration ([TD]s) was converted from resistivity according to ASTM F723-88. Fig. 4 
shows the distribution of the as-grown TD concentrations along the axial orientation in CZ 
and GCZ2 silicon crystals (Yang et al., 2004). Compared with the CZ silicon, the TD 
concentrations in the middle and the tail part of the GCZ2 silicon are much lower. The 
segregation coefficient of germanium in crystal is about 0.33, indicating that the germanium 
concentration would increase from the seed-end to the tang end of the crystal ingot. It is 
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therefore believed that germanium suppresses the formation of TDs during crystal growth 
so that the TD concentration is lower in the tail. Furhtermore, the TD concentration variation 
in the GCZ1 ingot was similar to that in the GCZ2, which is further inferred that TDs are 
inhibited in GCZ silicon when germanium concentration is above 10 16 cnv 3 . 
In fact, it was also found that the TD concentrations in the GCZ samples are always lower 
than those in the CZ wafers during low temperature annealing. In our experiments, the 
samples were annealed at different temperatures from 350°C to 500°C for different time to 
investigate the suppression effect of germanium on TD formation. The TD concentrations of 
the CZ and GCZ2 samples were plotted as a function of annealing time, as shown in Fig. 5 
(Yang et al., 2004). When annealed at 350°C or 500°C, there is nearly no change of the [TD]s 
in both the CZ and GCZ2 samples, meaning that almost no donors have been generated at 
these temperatures. When annealed at 400°C, [TD]s increased with the annealing time, 
however, the increase speed in the thermal donors in the GCZ2 is lower than that of the CZ 
samples. When annealed at 450°C, the [TD] variation speed is the most rapid one among all 
the anneal temperatures, while the [TD]s of the GCZ2 increases still lower than that of the 
CZ silicon. That is, germanium doping could suppress the formation of TDs. 
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Fig. 4. Distribution of the as-grown [TD]s along the axial orientation in the CZ and GCZ2 
silicon. (Yang et al., 2004) 
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Fig. 5. TD concentrations of the CZ (a) and GCZ2 (b) samples as a function of annealing 
time (Yang et al, 2004). 

The low temperature Fourie Tansmission Infrared (FTIR) absorption spectra of thermal 
donors (TDs) in GCZ silicon were found to be similar to the one in CZ silicon, but their 
density is different. Therefore, it is considered that light germanium doping suppresses the 
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formation of TDs but does not affect the microscopic structure of TDs. Fig. 6 shows the low- 
temperature FTIR spectra of the CZ and GCZ samples in far- (350-650 cm 4 ) and mid-IR (650- 
1200 cm -1 ) range, respectively (Cui et al., 2006). As can be seen in Fig. 6(a), a series of 
individual FTIR absorption lines related to TDs in silicon are observed in both the CZ and 
GCZ silicon. These absorption lines were caused by the transitions of neutral TDDs into the 
conduction band at low temperature of 10K and different absorption lines correspond with 
different donor energy levels (Wagner & Hage, 1989), and the neutral donors in the GCZ 
sample have the same energy levels as those in the CZ sample. Meanwhile, from the low- 
temperature FTIR absorption spectra of the CZ and GCZ silicon in the range 650-1200 cm 4 
illustrated in Fig. 5(b), the similar situation could be found. These series of FTIR absorption 
lines are reported to correspond with the singly ionized TDs (Wagner & Hage, 1989). It is 
obviously that the FTIR absorption spectrum of the singly ionized donors in the GCZ agrees 
quite well with that in the CZ silicon, but its density is much stronger. These results further 
confirm that the TDs in both the silicon samples are the thermal double donors (TDDs) with 
the same energy levels and microstructures. Therefore, it is considered that germanium 
doping in silicon suppresses the generation of TDs, but has little influence on their 
structures, which different from the results in heavily germanium content silicon, GeSi. In 
GeSi, the TDs were found to be broadbands in the FTIR spectra measured at low- 
temperature (Hild et al., 1998). 
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Fig. 6. (a) Low-temperature far-IR spectra of the CZ and GCZ silicon samples subjected to 
650°C/30min + 450°C/4h annealing, (b) low temperature mid-IR spectra of the CZ and 
silicon GCZ samples subjected to 450°C/30 min + 450°C/4h annealing. The TDD n refers to 
the nth (n=l-5) neutral donor in Fig. 6(a) and singly ionized donor in Fig. 6(b). (Cui et al., 
2006) 

When iso-electrical germanium atoms are incorporated into silicon lattice, they locate at 
substitutional sites and usually cause the increase of internal stress. During crystal growth, 
point defects could interact with germanium atoms. Vacancies incline to combine with 
germanium atoms to form Ge-V n complexes, which have been identified by DLTS 
measurements in GCZ silicon crystals (Budtz-Jorgensen et al., 1998). We have clarified that 
germanium can enhance the nucleation of oxygen precipitation in the wide temperature 
range of 650-1200°C, which is based on the assumed Ge-O and Ge-O-V complexes. 
Normally, the TDs generated around 450°C is due to the aggregation of oxygen atoms 
(Kaiser et al., 1958). The molar volume of TDs is larger than that of silicon, thus, during the 
TD formation the lattice strain must be released by attracting free vacancies whose 
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concentration is greatly decreased by the formation of Ge-V complexes. Meanwhile, the 
generation of TDs is a process of oxygen clustering, so that the interactions between the 
germanium and oxygen atoms together with the complexes of Ge-V with the fast diffusion 
02i dimmer will reduce the oxygen flux to form the smaller oxygen clusters during lower 
temperatures and therefore suppress the TD formation. 

Considering the effect of germanium doping on NDs, it has ever been reported that 
germanium could suppress the formation of NDs (Babitskii et al., 1985) in heavily 
germanium doping cases, and it was also suggested that the generate rate of oxygen 
precipitates and NDs was lowered down by the lattice deformation caused by germanium 
doping in silicon (Babich et al., 1995; Babitskii et al., 1988). However, our investigation 
showed an opposite results in light germanium doping silicon materials. 
Both the CZ and GCZ2 silicon were annealed at 650°C/128h and the ND concentrations 
([ND]s) in the wafers as a function of the annealing time was drawn in Fig. 6 (Li et al., 
2004b). The [ND]s for both the silicon increased with the annealing time at 650°C due to the 
formation of NDs. However, the ND formation rate in the GCZ2 sample is dramatically 
higher than that in the CZ one, so that the conductivity type reversed from p-type (all the 
original CZ silicon ingots are boron doped) to n-type after anneal for 128h in the GCZ2 
silicon, meaning that large number of NDs have generated due to the enhancement of 
germanium on the ND formation. Besides, from the oxygen concentration variation of the 
annealed samples, it is found that more oxygen atoms have precipitated in the GCZ2 
samples than in the CZ samples after 650°C/128h annealing. Generally, NDs are considered 
to be the bigger oxygen clusters compared to TDs, which are generally nuclei of oxygen 
precipitates during lower temperature anneal (Pensl et al., 1989). It is considered that, the 
enhanced ND formation by the germanium doping, is believed to be relative to the 
enhancement of oxygen precipitation. As germanium can enhance the nucleation of oxygen 
precipitates based on Ge-O complexes, some precipitate nuclei might become NDs. Thus, it 
is reasonable to suggest that most of these denser small oxygen precipitate nuclei become 
NDs with electrical activity at 650°C anneal. However, when germanium concentration is 
much larger than oxygen concentration, most of oxygen will be trapped by germanium to 
form Ge-O complexes, resulting in the reduction of oxygen flux to form NDs. Therefore, the 
formation of NDs will be suppressed, which was reported by Babitskii's work (Babitskii et 
al, 1985). 
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Fig. 7. ND concentration in the CZ and GCZ2 wafers annealed at 650°C as a function of the 
annealing time (Li et al., 2004b). 
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4. Oxygen precipitation 

Oxygen precipitates, the main micro-defects in CZ silicon especially used for the bulk 
isolated devices in early years, could not only deteriorate the electrical properties itself but 
also induce the secondary defects such as stacking faults and dislocations which increase the 
breakdown current of devices. However, oxygen precipitates with suitable density in bulk 
benefit both for the improvement of mechanical properties and for the enhancement of 
internal gettering capacitance for wafers. The super-saturated interstitial oxygen atoms in 
CZ silicon will accumulate to form grown-in oxygen precipitates due to post-anneal in 
crystal pullers, resulting in so called as-grown oxygen precipitates. It is also widely accepted 
that the supersaturated oxygen atoms in silicon matrix can precipitate and further induce 
secondary defects, so-called bulk micro-defects (BMDs), within device fabrication processes. 
Oxygen precipitates as well as BMDs are believed to be the gettering sites for metallic 
contamination. Thus, normally, it is required to control the concentration and distribution of 
oxygen precipitates in silicon bulk so that the optimum comprehensive effects benefit the 
quality of CZ silicon material. 

The germanium doping in CZ silicon is found to enhance not only as-grown oxygen 
precipitation but also oxygen precipitation during successive thermal anneals within a large 
temperature range; and it could also vary both the distribution situations of BMDs and 
microscopic morphology of oxygen precipitates, resulting in poor thermal stability of 
oxygen precipitates at elevated high temperatures. We consider that a certain complexes, the 
so called germanium-related complexes, could be generated in the GCZ silicon and thus 
change the behavior of oxygen precipitates in GCZ silicon. 

A CZ and two GCZ (GCZ1 and GCZ2 with [Ge] -10" and lO* 7 cm-3, respectively) silicon 
ingots with the comparable initial oxygen concentration have been selected to investigate 
the formation of grown-in oxygen precipitation: after annealing at 1270°C/2h to annihilate 
the thermal history, both the CZ and GCZ silicon were cooled down by a controlled rate of 
0.5°C/min and were taken out at 1150-85OC separately. The reductions of [OJ (A[OJs) in 
the CZ and GCZ samples as a function of the taking out temperatures is shown in Fig. 8(a) 
(Chen et al., 2006b). Generally, the thermal history of wafers can well influence the oxygen 
precipitation of CZ silicon during the successive annealing, while grown-in precipitates can 
be dissolved when annealed at considerably high temperatures above 1250°C (Kishino et al., 
1982). The A[OJ variation of the CZ and GCZ silicon annealed at 1270°C/2h is shown in Fig. 
8(b)(Chen et al., 2006b). It can be seen that the ratio of increased [OJ and as-received [OJ in 
the CZ silicon before and after annealing is a bit smaller than that of the GCZ silicon, which 
indicates the grown-in precipitates in the GCZ silicon is more than those in the CZ silicon. It 
is considered that germanium enhances the formation of grown-in oxygen precipitation 
during crystal growth. From Fig. 8(a), it can be also found that the [OJs of the GCZ2 silicon 
decreased much more dramatically than that of the CZ silicon in the whole temperature 
range and that the GCZ1 silicon decreased more slightly than that of the CZ silicon below 
1050°C, indicating that oxygen can precipitate more easily in the GCZ silicon crystals, even 
at the temperatures higher than 1150°C. 

Another ramping-up processing was also performed to investigate the effect of germanium 
on as-grown oxygen precipitation in GCZ silicon. Samples were annealed at a heating rate 
of l°C/min starting at 750°C, 850°C, 950°C or 1050°C, and ending at 1050°C with a 
isothermal anneal for 16 h,. The A[Oi]s as a function of the starting ramping temperature is 
shown in Fig. 9 (Chen et al., 2006b). It is believed that l°C/min is a suitable heating rate to 
grow up oxygen precipitate nuclei, if their radius is larger than the critical nucleation radius 
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Fig. 8. (a) Evolution of the A[Oj]s in the CZ and GCZ silicon after 1270°C/2h pre-anneal as a 
function of the taking out ramping temperatures during the cooling-down process, (b) A[OJ 
of the CZ and GCZ silicon before and after annealing at 1270°C/2h. (Chen et al., 2006b) 

of oxygen precipitates (r c ) at the starting temperatures of the ramping process, while the 
formation of new oxygen precipitate nuclei during the ramping is suppressed (Kissinger et 
al., 1998). Accordingly, the precipitated oxygen after 1050°C/16h anneal ramped from 
different starting temperatures are considered to be roughly related to the grown-in oxygen 
precipitates whose radius is larger than r c at the starting ramping temperature in the 
corresponding grown-in crystals. Thus, with the increase of starting temperatures, the 
amount of grown-in precipitates larger than the r c decreases, which results in the reduction 
of oxygen precipitates. As can be seen, the decreased A[OJs of the GCZ wafers was much 
more than that of the CZ wafers at every starting temperatures, which indicates that more 
grown-in oxygen precipitates have been generated in the GCZ wafers than that in the CZ 
wafers in the investigated temperature range (between 850 and 1050°C). Meanwhile, the [OJ 
curve of the GCZ2 wafer moved rightward relative to that of the CZ one as shown in Fig. 9. 
In this case, there is a sharp decrease of [OJ when the temperature below 950°C, which 
means the radius of majority of as-grown oxygen precipitates in the GCZ wafers was 
smaller than r c at 950°C, while that of most as-grown oxygen precipitates in the CZ wafer 
was smaller than r c at 850°C. It is therefore believed that the germanium incorporation 
increases the forming temperatures of as-grown oxygen precipitation during the cooling- 
down process of crystal growth, and thus larger as-grown oxygen precipitates could be 
presented in GCZ silicon when the cooling-down processing completed. 
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Fig. 9. Evolution of A[OJs in the CZ and GCZ silicon as a function of the starting ramping 
temperatures in the ramping process with l°C/min ramping-up rate. (Chen et al., 2006b) 
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It is also suggested that germanium doping could enhance the oxygen precipitation in CZ 
silicon wafer in a large temperature range (from 550 to 105OC) during successive annealing. 
The as-received samples of both the CZ and GCZ silicon were put into a diffusion furnace at 
550-95OC at every 100°C and then isothermally annealed for 2-64h, following by annealing 
at 1050°C for 16h. All the thermal treatments were preformed in an argon atmosphere. Fig. 
10 shows the A[Oi]s in the CZ and GCZ silicon annealed for 64 h as a function of the 
annealing temperatures (Chen et al., 2006a). As can be seen, the A[Oi]s in both the CZ and 
GCZ silicon samples annealed at above 850 °C were larger. Moreover, the amount of A[OJ 
was always larger in the GCZ silicon than in the CZ silicon. It is therefore suggested that 
germanium doping could enhance oxygen precipitation in CZ silicon in a wide temperature 
range. 
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Fig. 10. Left: A[Oi]s in the CZ and GCZ silicon annealed for 64 h as a function of annealing 
temperatures. Right: Optical micrographs of the BMDs in the CZ (a) and GCZ (b) silicon 
subjected to 950°C/64 h anneal. (Chen et al., 2006a) 

Normally, oxygen precipitate growth is limited by oxygen diffusion especially at low 
temperatures. (Joly & Robert, 1994) When annealed at low temperatures (such as 750 °C or 
below), the diffusivity of oxygen is considerably small, thus the growth of oxygen 
precipitates is not remarkable. However, a part of oxygen atoms can still aggregate into 
precipitate nuclei and embryos, so that the A[OJs in the GCZ silicon subjected to 64h anneal 
at 750 °C is somewhat larger than that in the CZ silicon. It is thus believed that the formation 
of precipitate nuclei is enhanced by germanium doping. When the silicon wafers were 
annealed at higher temperatures (such as 950°C and above), the oxygen diffusion coefficient 
greatly increased, while the supersaturation of interstitial oxygen in silicon crystal 
decreased. In this case, oxygen precipitation in silicon was primarily based on the as-grown 
precipitate nuclei. (Borghesi et al., 1995) The typical optical micrographs of BMDs induced 
by oxygen precipitates in the CZ and GCZ silicon samples subjected to 950°C/64 h anneal 
are also shown in Fig. 10. It can be clearly seen that the BMD density was much higher in the 
GCZ silicon than in the CZ silicon. It is generally believed that the grown-in oxygen 
precipitates have a size distribution following the Boltzmann s statistics. Only the oxygen 
precipitates with radius larger than r c at the annealing temperatures can survive and further 
grow up. Accordingly, the density of the grown-in oxygen precipitates with r c at 950 °C is 
much higher in the GCZ silicon than that in the CZ silicon. That is, again, germanium 
doping can enhance the formation of larger grown-in oxygen precipitates during crystal 
growth. 

Furthermore, if as-grown oxygen precipitates were eliminated by high temperature 
annealing, oxygen precipitation in GCZ silicon wafers during successive thermal cycles 
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could still be enhanced by germanium doping. Fig. 11 shows the A[OJs of the thermal- 
history-eliminated CZ and GCZ silicon wafers subjected to the 1050-1150°C/2h anneal. As 
can be seen, a bit larger oxygen precipitates could be generated in the GCZ silicon wafer 
than in the CZ silicon, which should be ascribed to the presentation as nucleation embryos 
of the germanium-related complexes formed in the GCZ silicon. 
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Fig. 11. A[Oi]s of the thermal-history-eliminated CZ and GCZ silicon wafers subjected to the 
1050-1150°C/2h anneal. 

Ascribed to the enhancement of oxygen precipitation nucleation at low temperatures by 
germanium doping, oxygen precipitation during successive annealing processing will be no 
doubt enhanced. Fig. 12 shows the comparison of [OJs and the optical micrographs of 
BMDs, which corresponds to oxygen precipitates and induced defects, in the CZ and GCZ 
silicon subjected to the two-step anneals consisted of different low temperature pre- 
annealing plus the same high temperature anneal at 1050°C (Chen et al., 2006a). In the 
experiments, the precipitate nuclei subsisted after the prior annealing would coarsen during 
the subsequent high temperature annealing since the oxygen precipitate is characteristics of 
growth at high temperatures. As can be seen in the left of Fig.12, with the pre-anneal at 650 
and 750 °C, the [OJs in the CZ silicon decreased almost to the oxygen solubility at 1050°C, 
while the [OJs remained at a bit higher levels in the GCZ silicon. Correspondingly, the 
dense BMDs in larger sizes were formed in the CZ silicon while denser BMDs in smaller 
sizes were generated in the GCZ silicon, as shown in Figs. 12R(a) and 12R(b). With the 850°C 
pre-anneal, the [OJ in the CZ silicon remained at the level much higher than the oxygen 
solubility at 1050 °C, while the [OJ in the GCZ silicon was much lower. Moreover, low 
density BMDs in smaller sizes were formed in the CZ silicon, while high density BMDs in 
larger sizes were formed in the GCZ silicon, which are illustrated in Figs. 12R(c) and 12R(d). 
Consequently, it is illuminated that oxygen precipitation is greatly enhanced by 
germanium-doping during low-high two step annealing. 

Generally, the nuclei of oxygen precipitates formed at lower temperatures have a size 
distribution and not all of them can survive in subsequent thermal cycles. That is, the nuclei 
with smaller size will dissolve while those with larger size will grow up. As shown in Figs. 
12R(a) and 12R(b), denser BMDs in smaller sizes were generated in the GCZ silicon in 
comparison with those in the CZ silicon, which is probably due to the much more nuclei 
formed at the lower temperatures by germanium-doping. Actually, high density of nuclei in 
the GCZ silicon was in a competition to attract interstitial oxygen atoms. Therefore, oxygen 
precipitation was to a certain extent retarded in the 1050°C anneal for the GCZ silicon and 
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Fig. 12. Left: [OJs in the CZ and GCZ silicon subjected to 1050°C/16h anneal following 64h 
pre-anneals at different temperatures of 650-850°C. Right: Optical micrographs of BMDs in 
the CZ and GCZ silicon subjected to two-step anneals: (a) CZ, 65OC/64h+1050°C/16h, (b) 
GCZ, 650°C/64h+1050°C/16h, (c) CZ, 850°C/64h+1050°C/16h„ and (d) GCZ, 850°C/64h+ 
1050°C/16h. (Chen et al, 2006a) 

the [OJ reduced while the BMD density increased in the GCZ silicon when the annealing 
duration was prolonged. That is, germanium-doping could greatly enhance the nucleation 
for oxygen precipitate at low temperatures, especially below 750 °C. For 850°C/64h pre- 
anneal case, only the oxygen precipitates whose sizes are larger than the r c at 850 °C could 
survive and further grow up in the subsequent 1050°C anneal. Most of the grown-in oxygen 
precipitates in the CZ silicon is smaller than r c at 850 °C, thus the oxygen precipitation in 
1050°C/16h anneal is slight. Whereas, from Figs. 12R(c) and 12R(d), the oxygen precipitate 
nucleation is enhanced in the GCZ silicon during the 850°C/64h anneal. Consequently, 
germanium-doping can increase the onset temperature up to 850°C for precipitate 
nucleation in the GCZ silicon, while, it is usually below 750 °C in CZ silicon. Furthermore, it 
is considered that the critical radius r c at 850 °C is reduced by germanium-doping, and the 
oxygen precipitates with smaller radius could generate and survive in the GCZ silicon. 
The morphology of oxygen precipitates in GCZ silicon is different from the ones in CZ 
silicon after different thermal treatments. Fig. 13 shows the transmission electron 
microscopy images of the oxygen precipitates and induced defects in CZ and GCZ samples 
subjected to 800°C/225h and 1000°C/225h anneal respectively (Yang et al., 2006b). After 
prolonged anneal at 800°C, platelet precipitates were typical in the CZ silicon, while particle 
precipitates besides platelet ones were also generated in the GCZ silicon [Figs. 13(a) and 
13(b)]; however, after 1000°C/ 225h annealing, the oxygen precipitates generated in the CZ 
silicon are mainly in polyhedral morphology, while entangled and mixed morphologies 
consisting of polyhedral and platelet were formed in the GCZ silicon [Figs. 13(c) and 13(d)]. 
It is reported that the platelet oxygen precipitates could be dissolved easier than the 
polyhedral ones (Shimura, 1994), thus, the microscopic morphology's variation of oxygen 
precipitates in GCZ silicon could decline their thermal stability at high temperatures. 
Fig. 14(a) shows the [Oi]s for both the CZ and GCZ silicon wafers in the statuses of as- 
grown, after the 1270°C/lh conventional furnace annealing (CFA) and after the 1280°C/60s 
rapid thermal annealing (RTA) treatment, respectively. The [OJs after CFA or RTA 
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treatments for both the CZ and GCZ silicon were higher than that for the as-grown ones, 
ascribing to the dissolution of grown- in oxygen precipitates. The germanium doping in CZ 
silicon could decline the thermal stability of grown-in oxygen precipitates by generating the 
platelet shape precipitates. Furthermore, the concentration of oxygen in both the CZ and 
GCZ silicon as a function of duration at RTA preformed at 1260°C with pre-annealing at 800- 
1000 °C for 225h has been shown in Fig. 14(b). As can be seen, the [Oi]s ia recovered and is 
slightly higher in the GCZ than in the CZ silicon, indicating the easier dissolution of oxygen 
precipitates in the GCZ silicon. 




Fig. 13. Transmission electron microscopy images of the oxygen precipitates in the annealed 
CZ and GCZ silicon, (a) CZ, 800°C/225h, (b) GCZ, 800°C/225h, (c) CZ, 1000°C/225h, (d) 
GCZ, 1000°C/ 225h. (Yang et al, 2006b) 
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Fig. 14. (a) [Oi]s in both the CZ and GCZ silicon wafers with as-grown status, conventional 
furnace annealing (CFA) at 1270°C/lh and rapid thermal annealing (RTA) at 1280°C/60s. 
(b) Variation of oxygen recoveries (8[OJs) in both the CZ (square points) and GCZ (circle 
points) silicon as a function of duration at RTA preformed at 1260°C with pre-annealing at 
800°C (full points) or 1000 °C (open points) for 225h (Chen et al., 2007d) 

Germanium atoms locate at the substitutional sites in CZ silicon, and induce distortion and 
local stresses in silicon lattice due to their larger atom radius. So, the lattice sites where 
germanium atoms locate are provided with potential activities and inclined to interact with 
other structural defects and / or impurities. Ge-V m or Ge-V m -O n (m, n>l) complexes, in the 
great amounts, are supposed to form for relieving the lattice stresses, and they could further 
act as heterogeneous precipitate nuclei to accumulate interstitial oxygen atoms in GCZ 
silicon. Due to the limit of oxygen content, the oxygen precipitates in GCZ silicon inclines to 
present with much smaller size than that in CZ silicon. It is said that vacancies in CZ silicon 
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could be gathered by germanium atoms to generate germanium-vacancy-related complexes 
and thus benefit the generation of polyhedral precipitates, so that the oxygen precipitates 
could be presented as mixed morphologies in GCZ silicon. Normally, when subjected to the 
high temperature treatments, the inner Si-O and Si-Si bonding in the oxygen precipitates can 
be easily cracked and the oxygen atoms situated in the precipitate originally could revert to 
interstitial oxygen atoms and finally diffuse out the precipitates. Ascribed to the distribution 
of smaller-sized and higher-density precipitates, the total surface area of oxygen precipitates 
in GCZ silicon can be dramatically heightened. The net oxygen flux out of precipitates is 
enhanced and the precipitates can be therefore dissolved easier in GCZ silicon. 



5. Void defects 

Voids, the main micro-defects in modern large diameter silicon crystal, play more important 
roles in the reliability and yield of ULSI devices. It is well established that voids, especially 
those locate in the near-surface region of wafers, can deteriorate gate oxide integration (GOI) 
and enhance the leakage current of metal-oxide-semiconductor devices (Huth et al., 2000; 
Park et al., 2000). As a result of the agglomerations of excess vacancies during crystal 
growth, it is believed that voids are normally of an octahedral structure, about 100-300 ran 
in size and with a thin oxide film of about 2nm on their {111} surfaces (Itsumi et al., 1995; 
Yamagishi et al., 1992). It has been reported that during cooling-down process of silicon 
crystal from the melting point to room temperature, grown-in voids are formed with 
densities between 10 5 -10 7 cnr 3 (Yamagishi et al., 1992). 

The techniques to control voids have been studied extensively over years, and three 
different ways to achieve this have been widely accepted: 1) thermally controlled CZ silicon 
crystal growth (Voronkov, 1982), 2) high-temperature annealing (Wijaranakula, 1994) and 3) 
nitroge doping (Yu et al., 2002). It is believed that the GOI failure of devices can be 
improved by germanium doping. The characteristics of the grown-in voids in GCZ wafers, 
including flow pattern defects (FPDs) and crystal originated particles (COPs) [two main 
formations of void defects], suggested that germanium can suppress larger voids, resulting 
in denser and smaller voids. Meanwhile, it has been found that the density of voids can be 
decreased by germanium doping and then can be eliminated easily in GCZ silicon crystals 
through high temperature annealing. 





Fig. 15. Optical microscopic photographs of FPDs in the head samples of (a) CZ and (b) GCZ 
silicon crystal. (Yang et al., 2002) 



382 



Advances in Solid State Circuits Technologies 



Three p-type GCZ silicon crystal ingots with different germanium concentrations ([Ge]s) 
(10 15 cnr 3 , 10 16 cnv 3 and 10 17 cnv 3 in the head portions while/ and 10 16 cnr 3 , 10 17 cm- 3 and 10 18 cnr 
3 in the tail portions and were named as GCZ1, GCZ2, and GCZ3 silicon, respectively) and a 
conventional CZ Silicon crystal were pulled under almost the same growth conditions. 
Typical optical microscopic photographs of FPDs in the head portion samples of the CZ and 
GCZ3 silicon crystals are shown in Fig. 15 (Yang et al., 2002). The FPD density in the GCZ3 
silicon wafer was much less than that in the CZ silicon crystal. Similar results were also 
found in the tail samples. It can accordingly be concluded that germanium doping could 
significantly suppress the voids in GCZ silicon crystals. The FPD densities in the as-grown 
silicon wafers sliced from different portions of the four ingots are shown in Fig. 16 (Yang et 
al., 2002). As can be seen, the FPD densities in the head samples of the CZ, GCZ1 and GCZ2 
silicon wafers were almost the same, while that of the head sample of the GCZ3 with a 
relatively higher [Ge] of 10 17 crrr 3 was much lower. For the CZ silicon crystal, the FPD 
density of the tail sample was almost the same as that of the head sample. However, for the 
GCZ1, GCZ2 and GCZ3 silicon crystals, the FPD densities of the tail samples were less than 
those of the head. Due to the segregation coefficient of germanium in silicon crystal is 0.33, 
[Ge] in the tail portion of the GCZ silicon is believed to be higher than that in the head 
portion. It is therefore clear that the FPD densities in the GCZ silicon wafer decreased with 
the increase of [Ge], and the FPD density in the grown-in GCZ silicon wafer is much less 
than that in the conventional CZ wafer. Germanium doping in CZ silicon could significantly 
suppress voids during crystal growth. 
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Fig. 16. FPD densities in the head and tail portions of the as-grown CZ and GCZ silicon 
crystals samples with different germanium concentrations. (Yang et al., 2002) 

Furthermore, it is suggested that the thermal stability of FPDs in GCZ silicon is much poorer 
than that in CZ silicon. Fig. 17 indicates the FPD densities in both the CZ and GCZ silicon 
samples before and after different annealing. As can be seen, after the 1050°C/2h annealing, 
the FPD density in the GCZ silicon is significantly reduced, while that in the CZ silicon 
crystals remains almost constant. Although the FPD density in the CZ silicon wafer 
decreased to a considerable extent after 1150°C/2h annealing, it was still much higher than 
that in the GCZ1 wafer. However, after 1200°C/2h annealing, the FPD densities in both the 
CZ and GCZ1 silicon wafers decreased to nearly the same level. The prolonged annealing at 
high temperatures has no notable effect on the annihilation of FPDs. That is, the FPDs in the 
GCZ silicon crystals can be annihilated at lower temperatures than those in the CZ crystal, 
implying the thermal stability of voids in the GCZ silicon crystals is much poorer, i.e., the 
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voids in the GCZ silicon crystals can be eliminated by high temperature anneals with a low- 
cost heat budget. 
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Fig. 17. FPD densities in both the CZ and GCZ silicon samples before and after different 
high temperature annealing. (Yang et al., 2002) 

Fig. 18 shows the size profiles of grown-in COPs in both the CZ and GCZ silicon wafers 
(Yang et al., 2006a). As can be seen, an increase in the percentage of COPs which are smaller 
(0.11-0.12 urn), and a decrease in the percentage of COPs which are larger (over 0.12um) in 
the GCZ silicon wafers compared to those in the CZ silicon wafer has been suggested. The 
total amount of grown-in COPs on the GCZ silicon wafers was actually more than that on 
the CZ wafers, meaning germanium doping could induce a higher density of COPs 
generated with smaller sizes. As noted, the evolution of COPs in as-grown GCZ silicon 
seems not to coincide with the result given by FPDs detection. It is worthwhile to point out 
that the FPDs are believed to be deduced by larger voids, i.e., only those whose radius is 
larger than the critical radius r c can bring enough hydrogen bubbles to etch wafer surface 
and leave flow patterns. Suggested by the results of COPs detection, the quantity of larger 
voids in GCZ silicon crystals is less than that in CZ silicon. Therefore, it is reasonable to 
conclude that the fewer FPDs in the GCZ silicon samples is associated with the lack of larger 
voids while the higher density COPs on the GCZ silicon wafers is mainly contributed by 
smaller size voids. 
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Fig. 18. Density and size profiles of the COPs on (a) CZand (b) GCZ S ilicon wafers. (Yang et 
al., 2006a) 
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Similar with the FPDs, poorer thermal stability of COPs could be also detected. Fig. 19 
shows the COP maps for both the CZ and GCZ silicon wafers sampled from the tail portions 
of the crystals before and after annealing in hydrogen at 120OC (Yang et al., 2006a). COP 
density on the GCZ silicon was much lower than that on the CZ silicon after the annealing, 
indicating that the COPs on CZ silicon wafer can be annihilated more easily by germanium 
doping. Actually, at the subsurface (such as at the depth of 30|im) in the annealed wafers, it 
was also found that more grown-in COPs were annihilated on the GCZ silicon wafers than 
on the CZ ones. Also, from the comparison of COP densities of the CZ and GCZ silicon 
annealed in Ar or H2 atmosphere shown in Fig. 20 (Chen et al., 2007a), it could be found that 
germanium doping could reduce the thermal stability of grown-in COPs not only on the 
surface but also in the bulk of the GCZ silicon wafers. Consequently, it is suggested that 
germanium doping could effectively deteriorate the thermal stability of grown-in COPs on 
wafers. 




CZ 120r/C/2h 



GCZ 120O°C/2h 



Fig. 19. COP maps of the CZand GCZ silicon wafers before and after annealing in hydrogen 
at 1200°C. (Yang et al, 2006a) 
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Fig. 20. Normalized COP densities of the CZand GCZ silicon wafers annealed in (a) Ar or (b) 
H2 atmosphere as a function of the depth from the wafer surface. Notice that the curves were 
fitted following exponential growth method. (Chen et al., 2007a) 
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Herein, we discuss on the mechanism of germanium doping on void defects by forming 
germanium-related complexes. It is considered that, germanium atoms can react with the 
intrinsic point defects in CZ silicon crystals, so that the formation of vacancy-based micro- 
defects, such as P-band and voids, will be influenced by germanium doping. Meanwhile, the 
germanium atoms located at substitutional sites of silicon lattice cause lattice distortion and 
lattice stress. To relieve the lattice stress, germanium inclines to react with vacancy and/ or 
oxygen to form Ge-V m or Ge-V n -O m (m, n>l) complexes when GCZ wafers are annealed at 
high temperatures, and that the complexes would survive at low temperatures and become 
the nuclei of oxygen precipitates. Thus, prior to the nucleation of voids, the nuclei of oxygen 
precipitates can grow by the rapid diffusion of oxygen and absorption of a considerable 
number of vacancies at high temperatures. Accordingly, the number of surviving vacancies 
contributing to the formation of voids during the subsequent cooling is reduced. 
The driving force for void formation is the gain in volume free energy per vacancy 
associated with vacancy super-saturation, i.e., the vacancy chemical potential /(Voronkov & 
Falster, 1998): 

/ = fc B Tlogf^l (1) 

where kv is Bolztman's constant, T is the void nucleation temperature, C e is the equilibrium 
vacancy concentration, and Co is the initial vacancy concentration (the actual vacancy 
concentration in as-grown silicon). From equation (1), it can be found that the void 
nucleation temperature T will be lower when the initial vacancy concentration Co is reduced 
by germanium doping in CZ silicon crystal. Therefore, voids, especially for those with large 
volume voids which are believed to be the origin of FPDs, are suppressed in as-grown GCZ 
silicon crystal. This can also explain the fact that the FPD density decreases with the increase 
of germanium concentration shown in Fig. 16. Additionally, the voids could be formed 
during lower temperature annealing because of the plentiful vacancy consumption caused 
by the formation of the germanium-related complexes, which is illustrated in Fig. 18. In fact, 
when binding temperature of germanium and vacancies T\, is higher than nucleation 
temperature of voids T n the void formation will be strongly or completely suppressed, due 
to a lack of free vacancies (Voronkov & Falster, 2002). Because T\, is probably higher than T n , 
the void formation will be suppressed due to the decrease in free vacancies which results in 
the decrease of Co. According to Voronkov 's results, the density N and size R (assuming the 
voids to be spheres in silicon lattice and the radius R standing for their size) of voids in CZ 
silicon crystals accord with the relational expression as follows: 

3 1 

1.72 V qE" >f 2C o 



^l.w z&M^I (2) 



R = 13 5 (m -)ife|ll : (3) 

From which, one could conclude that the N and R of voids is direct proportional to the 
initial vacancy concentration Co. Therefore, the formation of lower density FPDs and denser 
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COPs with smaller size were believed to be enhanced in GCZ silicon crystals, due to the 
decrease of the initial vacancy concentration Co, as well as the decrease of the formation 
temperature T of voids. Furthermore, higher germanium concentration in CZ silicon 
benefits the higher COP density, thus the COP density in the tail portion is higher than that 
of the head and middle portion of the GCZ silicon crystals, which is shown in Fig. 16. 
Moreover, voids in CZ silicon usually form in a narrow temperature range about 30°C 
below 110OC during crystal growth. They could be annihilated especially in hydrogen gas 
during elevated temperatures annealing due to dissolving the inner oxide films surrounding 
voids. The removal of oxide films on the inner walls of grown-in void defects is believed to 
be the first step in the reduction process, which is an oxygen diffusion-determined process 
(Adachi et al., 2000). Then the second step is the shrinkage of voids through the diffusion of 
vacancies, which is a diffusion-determined process. For GCZ silicon crystal, due to the 
decrease of void formation temperature T and the increase of void density N, the thickness 
of inner oxide film of voids in GCZ silicon crystals might be thinner than that in CZ silicon; 
additionally, the volume of voids in GCZ silicon crystals is considered to be smaller than 
that in CZ silicon. Therefore, the voids in GCZ silicon could be dissolved by thermal cycles 
easier comparable to those in CZ silicon. 

6. Application of germanium doped Czochralski silicon: two examples 

6.1 Thick epitaxial layers on germanium doped CZ silicon substrate 

Misfit dislocations (MDs) would lead significant junction leakage into transistors, while the 
generation of MDs is still a serious issue in the volume fabrication of p/p+ epi- wafer to date. 
It has been suggested that germanium doping can suppress the epi-layer MDs on high 
boron doped CZ silicon substrates (Jiang et al., 2006). A 50um thick p/p + epi-wafers were 
grown on the conventional heavily boron-doped (B-doped) substrate and germanium boron 
co-doping (Ge-B-co-doped) silicon substrates. The germanium content in the CZ silicon is 
calculated aiming to balance the stress induced by boron doping. However, in principle, the 
co-doping of germanium and boron in CZ silicon substrate can be tailored to achieve misfit 
dislocation-free epi-layer with required thickness. It is therefore expected that this solution 
to elimination of MDs in p/p + silicon wafers can be applied in volume production. 
Fig. 21 shows the optical images of the etched interface of the p/p+ epi-wafers with 11 urn 
thick epi-layer grown on the conventional heavily boron doped and Ge-B-codoped 
substrates, respectively. As can be seen, in the p/p+ epi-wafer grown on the conventional 
heavily boron-doped substrate, there were three sets of MDs on the etched interface, which 
can even be distinguished by naked eye under a spotlight. While, there were no MDs in the 
p/p+ epi-wafer using the Ge-B-codoped substrate wafer. It is definite that the MDs in the 
p/p+ epi-wafers can be avoided by using the Ge-B-codoped substrates. Furthermore, a 
much thicker epi-layer could be fabricated on the Ge-B-copdoped substrate wafer without 
misfit dislocations. Fig. 22 shows both the classical cross-view and top-view optical images 
of the etched silicon samples. Fig. 22(a) reveals that, in the p/p + epi-wafer grown on the 
conventional heavily B-doped substrate, the MDs penetrated into the epi-layer. Whereas, in 
the top-view optical images of the etched interface of the p/p + epi-wafers, the triangularly 
intersected MDs are clearly demonstrated [Fig. 22(c)]. On the contrary, for the p/p + epi- 
wafers using the Ge-B-co-doped silicon substrate, MDs could hardly be observed [Figs. 22(b) 
and 22(d)]. 
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Fig. 21. Plan-view optical images of the etched interface in the 11 urn thick p/p+ epi-wafers 
using the conventional (a) heavily boron-doped substrate and (b) Ge-B-codoped substrate. 
(Jiang etal., 2006). 




Fig. 22. Cross-sectional-view optical images of the 50um thick p/p+ epi-wafers grown using 
conventional heavily boron-doped substrate (a) and Ge-B-co-doped substrate (b). And plan- 
view optical images of the 50 |im thick p/p+ epi-wafers grown using conventional heavily 
boron-doped substrate (c) and Ge-B-co- doped substrate (d) (Jiang et al., 2006). 



6.2 Improved internal gettering capability 

Double-side mirror polished wafers will be adopted for industrial manufacturing processes 
of large diameter CZ silicon, such as 300mm diameter silicon, ascribed to the higher 
requirements of wafer surface flatness. Therefore, the external gettering processes (such as 
sand sputtering processes and polycrystalline silicon depositing processes) on backside of 
CZ silicon wafers will be out of date and replaced by internal gettering (IG) processes based 
on the formation of high density BMDs in bulk and the thin defect-free denuded zone (DZ) 
in sub-surface of wafers simultaneously, which can be illustrated in Fig. 23(c) (Chen & Yang, 
2009). However, with the ever-decreasing feature size of integrated circuits, the thermal 
budget for advanced devices is reduced to improve the characteristics; meanwhile, the 
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application of magnetic-filed CZ-grown method to large diameter crystal growth leads to 
the reduction of oxygen concentration in silicon. Both trends led to the density reduction of 
BMDs which are related to gettering sites for metallic contamination. 

Fig. 23 illustrates the model of the influence of germanium on generation of IG structure for 
CZ silicon wafer. Generally, for IG effect, both the high density BMDs and the suitable 
width of DZ could be generated in the CZ silicon doped with some types of impurities, so as 
to improve the IG capability of the metal contamination and improve the quality of IC 
devices. Compared to the CZ silicon, germanium atoms could generally induce germanium- 
related complexes and then seed for oxygen precipitation in bulk silicon during IG 
denudation processing based on either CFA or RTA processing. Both the good-quality 
defect-free DZ in sub-surface region and the BMD region with higher density in bulk silicon 
could be obtained simultaneously in the GCZ silicon. Generally, the DZ shrinks and is 
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Fig. 23. Schematic illustrations for internal gettering (IG) structure' in GCZ silicon wafers. 
(a)-(d) shows the normal steps generating IG structure for silicon wafer and the gettering 
capability. As an example, (e)-(f) shows the germanium effects upon IG structure and 
capability. (Chen & Yang, 2009) 
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Fig. 24. Representative cross-sectional etched optical microphotographs in both the normal 
CZ and GCZ silicon wafers, (a) CZ, before Cu in-diffusion; (b) GCZ, before Cu in-diffusion; 
(c) CZ, after Cu in-diffusion; and (d) GCZ, after Cu in-diffusion. (Chen et al., 2007c) 
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slightly smaller than that of the CZ silicon wafer, which might be ascribed to the denser 
small precipitates located at the boundary of DZ and BMD region. Nevertheless, it has been 
also indicated that the DZs could present in the GCZ silicon wafers after a certain critical 
anneals despite the width shrinkage (Chen et al., 2007c). 

IG capability for metallic contamination could be therefore enhanced by intentional 
germanium doping in CZ silicon wafers. Taking copper contamination as an example (Chen 
et al., 2007c). Fig. 24 shows the cross-sectional etching optical photographs of both the 
normal CZ and GCZ silicon wafers before and after Cu diffusion in 1100°C/lh. As can be 
seen, denser BMDs of smaller size with denser Cu precipitates were presented in bulk of the 
GCZ silicon wafers in comparison with the CZ silicon, indicating a stronger IG capability in 
the GCZ silicon. The explanation could be, the denser gettering sites (even with smaller size) 
can lower down the total interstitial Cu concentration in wafer bulk, therefore more Cu 
atoms could be gettered in the GCZ silicon due to the denser but smaller BMDs. It is noted 
that the fairly clean DZs near surfaces remained in both the silicon wafers, which ensures 
the integrity of wafer sub-surface for device fabrication. 

7. Summary 

We have illustrated the effect of germanium doping in CZ silicon on mechanical strength, 
oxygen-related donors, oxygen precipitation and void defects. It has been established that 
the mechanical strength of silicon wafers could be improved by intended germanium 
doping, which benefits the improved production yield of wafers. It is also found that 
germanium suppresses the generation of TDs, which benefits the stable electrical property 
of wafers. More importantly, germanium has been found to suppress the formation of void 
defects, which can be annihilated easily during high temperature treatments. Moreover, 
oxygen precipitation can be enhanced by germanium doping, and therefore IG capability 
could be improved. Additionally, compared to nitrogen doped CZ silicon, germanium 
doping level in CZ silicon could be much easier to control, and no electrical Centers such as 
shallow thermal donors will be introduced. Ascribing to the novel properties, it is 
considered that GCZ silicon could satisfy the higher requirements of ULSI. 
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1. Introduction 

Today, disease interpretation of excised tissue is performed by analyzing biopsy specimens 
with a tabletop microscope [1]. While this method is effective, the process can be limited by 
sampling error, processing costs, and preparation time. In addition, the interpretive accuracy 
of the specimens can be affected by artefacts associated with tissue sectioning, paraffin 
embedding, and histochemical staining. Thus, a lot of effort has gone into the development of 
new methods that perform real time in vivo imaging with sub-cellular resolution. 
Confocal microscopy is a powerful optical imaging method that can achieve sub-cellular 
resolution in real time. The technique of optical sectioning provides clear images from 
"optically thick" biological tissues that have previously been collected with large, tabletop 
instruments that occupy the size of a table [2, 3]. They can be used to collect either 
reflectance or fluorescence images to identify morphological or 

molecular features of cells and tissues, respectively. Moreover, images in both modalities 
can be captured simultaneously with complete spatial registration. This approach uses a 
"pinhole" placed in between the objective lens and the detector to allow only the light that 
originates from within a tiny focal volume below the tissue surface to be collected. For 
miniature instruments, the core of an optical fiber is used as the "pinhole." 
Recently, significant progress has been made in the development of endoscope-compatible 
confocal imaging instruments for visualizing inside the human body. This direction has 
been accelerated by the availability, variety and low cost of optical fibers, scanners, and light 
sources, in particular, semiconductor lasers. These methods are being developed for use in 
the clinic as well as in small animal imaging facilities. The addition of a miniature real-time, 
high resolution imaging instrument can help guide tissue biopsy and reduce pathology 
costs. However, these efforts are technically challenging because of the demanding 
performance requirements for small instrument size, high image resolution, deep tissue 
penetration depths, and fast frame rates. 

The performance parameters for miniature in vivo confocal imaging instruments are 
governed by the specific application. An important goal is the early detection and image 
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guided therapy of disease in hollow organs, including colon, esophagus, lung, oropharynx, 
and cervix. Applications can also be found for better understanding of the molecular 
mechanisms of disease in small animals. In particular, localization of pre-malignant 
(dysplastic) lesions in the digestive tract can guide tissue biopsy for early detection and 
prevention of cancer. In addition, visualization of over expressed molecular targets in small 
animal models can lead to the discovery of new drugs. 
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Fig. 1. Dysplasia represents a pre-malignant condition in the epithelium of hollow organs, 
such as the colon and esophagus. The dual axes confocal architecture has high dynamic 
range that is suitable for imaging in the vertical cross-sectional plane to visualize disease 
processes with greater tissue penetration depths. 

As shown in Fig. 1, dysplasia originates in the epithelium and represents an important step 
in the transformation of normal mucosa to carcinoma. Dysplasia has a latency period of 
approximately 7 to 14 years before progressing onto cancer and offers a window of 
opportunity for evaluating patients by endoscopy who are at increased risk for developing 
cancer. The early detection and localization of dysplastic lesions can guide tissue resection 
and prevent future cancer progression. Dysplastic glands can be present from the mucosal 
surface down to the muscularis. Thus, an imaging depth of -500 urn is sufficient to evaluate 
most early epithelial disease processes. 

On reflectance imaging, sub-cellular resolution (typically <5 urn) is needed to identify 
nuclear features, such as nuclear-to-cytoplasm ratio. On fluorescence imaging, high contrast 
is needed to distinguish between the target and background. With both modalities, a fast 
imaging frame rate (>4 Hz) is necessary to avoid motion artefact. 



2. Single axis confocal architecture 

A. Configuration of optics 

Recent advances in the development of microlenses and miniature scanners have resulted in 
the development of fiber optic coupled instruments that are endoscope compatible with 
high resolution, including single [4-8], and multiple fiber [7, 9] strategies. Different methods 
of scanning are also being explored [10-14]. 

All of these endoscope compatible designs use a single axis design, where the pinhole (fiber) 
and objective are located along one main optical axis. A high NA objective is used to achieve 
sub-cellular resolution and maximum light collection, and the same objective is used for 
both the illumination and collection of light. In order to scale down the dimension of these 
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instruments for endoscope compatibility, the diameter of the objective must be reduced to 
~5 mm or less. As a consequence, the working distance (WD) as well as the field-of-view 
(FOV) is also decreased, as shown by the progression of the 3 different objectives in Fig. 2. 
The tissue penetration depth also decreases, and is typically inadequate to assess the tissue 
down to the muscularis, which is located at a depth of -500 p,m and is an important 
landmark for defining the early presence of epithelial cancers. 
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Fig. 2. For endoscope compatibility, the diameter of a single axis confocal microscope must 
be scaled down in size (A— >B— >C), resulting in a reduced working distance and limited 
tissue penetration depth. 

B. Resolutions 

For the conventional single axis architecture, the transverse, Ar S/ and axial, Az s , resolution 
between full-width-half-power (FWHP) points for uniform illumination of the lenses are 
defined by the following equations [3]: 



Ar 



0.37i 0.37/1 



nsma 
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0.89/1 



1.78/1 



K(l-cosa) na 2 



(1) 



where A is the wavelength, n is the refractive index of the medium, a is the maximum 
convergence half -angle of the beam, NA = wsina is the numerical aperture, and sincr« a for 
low NA lenses. Eq. (1) implies that the transverse and axial resolution varies as 1/NA and 
1/NA 2 , respectively. A resolution of less than 5 (im is adequate to identify sub-cellular 
structures that are important for medical and biological applications. To achieve this 
resolution in the axial dimension, the objective lens used requires a relatively large NA 
(>0.4). The optics can be reduced to the millimeter scale for in vivo imaging, but requires a 
sacrifice of resolution, FOV, or WD. Also, a high NA objective limits the available WD, and 
requires that the scanning mechanism be located in the pre-objective position, restricting the 
FOV and further increasing sensitivity to off -axis aberrations. 

C. Commercial systems 

Two endoscope compatible confocal imaging systems are commercially available for clinical 
use. The EC-3870K (Pentax Precision Instruments, Tokyo, Japan) has an integrated design 
where a confocal module (Optiscan Pty Ltd, Victoria, Australia) is built into the insertion 
tube of the endoscope, and results in an overall diameter of 12.8 mm, as shown in Fig. 3a 
[15]. This module uses the single axis optical configuration where a single mode optical fiber 
is aligned on-axis with an objective that has an NA =» 0.6. Scanning of the distal tip of the 
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optical fiber is performed mechanically by coupling the fiber to a tuning fork that vibrates at 
resonance. Axial scanning is performed with a shape memory alloy (nitinol) actuator that 
can translate the focal volume over a distance of to 250 (im below the tissue surface. 
Excitation is provide at 488 ran (peak absorption of fluorescein) by a semi-conductor laser, 
and a transverse and axial resolution of 0.7 and 7 |im, respectively, has been achieved. The 
images are collected at a frame rate of either 0.8 or 1.6 Hz to achieve a FOV of either 
1024x1024 or 1024x512 pixels, respectively. The dimension of the confocal instrument by 
itself is ~5 mm. When a suspicious lesion is identified, the confocal window located on the 
distial tip is placed into contact with the tissue to collect images. A separate instrument 
channel can be used to obtain pinch biopsies of tissue. 
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objective 



Fig. 3. a) The EC-3870K (Pentax) has a confocal module (Optiscan) integrated into the 
endoscope insertion tube, b) The Cellvizio® GI is a confocal miniprobe that passes through 
the instrument channel of the endoscope. 

The Cellvizio® GI (Mauna Kea Technologies, Paris, France) uses a set of miniprobes that 
range in diameter from 1.5 to 2.5 mm, and passes through the standard instrument channel 
of medical endoscopes, as shown in Fig. 3b. This instrument moves independently of the 
endoscope, and its placement onto the tissue surface can be guided by the conventional 
white light image [8, 15]. This miniprobe consists of a fiber bundle with -30,000 individual 
fibers that is aligned on-axis with an objective that has an NA =» 0.6. The core of each 
individual fiber acts as a collection pinhole to reject out-of-focus light. Scanning is 
performed at the proximal end of the bundle in the instrument control unit with a 4 kHz 
oscillating mirror for horizontal lines and a 12 Hz galvo mirror for frames. In this design, 
axial scanning cannot be performed. Instead, separate miniprobes that have different 
working distances are needed to optically section at different depths. Excitation is provided 
at 488 ran, and the transverse and axial resolution of these instruments ranges from 2.5 to 5 
|im and 15 to 20 |im, respectively. Images are collected at a frame rate of 12 Hz with a FOV 
of either 600x500 urn^ or 240x200 nirf. 



3. Dual axes confocal architecture 

D. Configuration of optics 

So far, the aforementioned miniaturization techniques in the previous section deploy a 
conventional single-axis confocal architecture that has the objective and optical fiber aligned 
along the same optical axis. In order to overcome some of these limitations for endoscope 
compatibility and in vivo imaging, we have developed the novel dual axes confocal 
configuration, shown in Fig. 4. We use two fibers oriented along separate optical axis of 
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different low NA objectives to spatially separate the light paths for illumination and 
collection [16, 17]. The region of overlap between the two beams (crossed at a half angle 6 
from the midline) defines the focal volume, hence the resolution, and can achieve sub- 
cellular dimensions. A very low probability exists for light scattered by tissue along the 
illumination path (blue cone) to enter the low NA collection objective (green cone), thus 
significant improvement in the dynamic range of detection can be achieved. 




tissue 



Fig. 4. Novel dual axes confocal architecture uses separate optical fibers and low NA lenses 
for off-axis light collection, achieving long working distance, high dynamic range, and 
scalability while preserving resolution. 

Furthermore, the low NA objectives enable an increased working distance so that the 
scanning mirror can be placed on the distal (tissue) side of the lens (post-objective position), 
resulting in less sensitivity to off -axis aberrations [17]. In this configuration, the beams 
always pass through the low NA objectives on axis, resulting in a diffraction-limited focal 
volume that can then be scanned over a large FOV, limited by the performance of the 
scanner rather than by the optics. This design feature allows for the instrument to be scaled 
down in size to millimeter dimensions for compatibility with medical endoscopes without 
loss of performance. 

We first develop the theory to explain the unique performance features of the dual axes 
confocal architecture by characterizing the point-spread function (PSF) and dynamic range. 
Then, we demonstrate the scaled down implementation of this configuration in miniature 
prototypes. Because of the challenges of packaging in such a small form factor, we first 
demonstrate a handheld (10 mm diameter) instrument and then an endoscope-compatible 
(5.5 mm diameter) prototype, using the same MEMS mirror and scanhead optics. 
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E. Definition of coordinates 

The coordinates for the dual axes confocal configuration are shown in Fig. 5. The 
illumination (IO) and collection (CO) objectives represent separate low NA lenses. The 
maximum convergence half-angles of the illumination and collection beams are represented 
by cti and a c , respectively. The separate optical axes are defined to cross the z-axis (zd) at an 
angle 0. The main lobe of the PSF of the illumination objective is represented by the light 
gray oval. This lobe has a narrow transverse but a wide axial dimension. 



CO 




Fig. 5. Coordinates for dual axes confocal configuration 

Similarly, the main lobe of the PSF of the collection objective is similar in shape but 
symmetrically reflected about Zd, as represented by the dark gray oval. For dual axes, the 
combined PSF is represented by the overlap of the two individual PSF's, represented by the 
black oval. This region is characterized by narrow transverse dimensions, Axd and Ayd (out 
of the page), and by a significantly reduced axial dimension, Azd, which depends on the 
transverse rather than the axial dimension of the individual beams where they intersect. 

F. Point spread function 

The dual axes PSF can be derived using diffraction theory with paraxial approximations [18]. 
The coordinates for the illumination (x^y^Zi) and collection (x c ,y c ,z c ) beams are defined in 
terms of the coordinates of the main optical axis (xd,yd,Zd), and may be expressed as follows: 



X; = XjCOSt 

Yi =Yd 



Z(jSin( 



xj sinW + zj cost 



x c = x d cos 6 + z d sin 6 

Yc =Yd 

z c = -xj sin 8 + z d cos ( 



(2) 



The maximum convergence half-angles of the focused illumination and collection beams in 
the sample media are represented as ai and a c , respectively. The angle at which the two 
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beams intersect the main optical axis is denoted as 9. A set of general dimensionless 
coordinates may be defined along the illumination and collection axes, as follows [19]: 



= kjnz: sin 2 a ; u r =k r nz r sin 2 a r 



qn-^/x 2 + y 2 sin a ; v c = k c n^x 2 + y 2 sin a c 



(3) 



The wavenumbers for illumination and collection are defined as kj = 2n/Xi and k c = 2n/X c , 

respectively, where X\ and >^ c are the wavelengths, and n is the index of refraction of the 

media. 

The amplitude PSF describes the spatial distribution of the electric field of the focused 

beams. Diffraction theory may be used to show that the PSF of the illumination and 

collection beams is proportional to the Huygens-Fresnel integrals below [18]: 



UjCv^Ui) <x jW i (p)J (pv i )e- jUip2 / 2 pdp (4) 



U c (v c ,u c )^jW c (p)J (pv c ) e - ju < p2 / 2 pdp (5) 



where Jo is the Bessel function of order zero, and p is a normalized radial distance variable at 
the objective aperture. The weighting function, W(p), describes the truncation (apodization) 
of the beams. For uniform illumination, W(p) = 1. For Gaussian illumination, the objectives 
truncate the beams at the 1/e 2 intensity, resulting in a weighting function of W(p) = e . 
In practice, the beams are typically truncated so that 99% of the power is transmitted. For a 
Gaussian beam with a radius (1/e 2 intensity) given by w, an aperture with diameter rew 
passes -99% of the power. In this case, the weighting function is given as follows: 

W(p) = e^ 7tp/2 ) 2 (6) 

For the single axis configuration, the illumination and collection PSF's at the focal plane (ui = 
u c = 0) are identical functions of the radial distance p, and can both be given by U s using 
the substitution v = knr sin a , as follows: 

1 
U s (v)<x}W 1 (p)J (pv)pdp (7) 



The resulting signal at the detector V from a point source reflector in the media is 
proportional to the power received, and is given by the square of the product of the 
overlapping PSF's as follows: 

V = A|U 1 U C | 2 (8) 

where A is a constant. 

Similarly, since the depth of focus for each individual beam, described within the 

exponential term in the integral product of Eqs. (3) and (4), is much larger than that of the 
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transverse width, the exponential term may be neglected. As a result, the detector output 
Vh for the dual axes configuration for uniform illumination (W = 1), is given as follows: 



Vh °c 



2Ji(vj) 



2J,(v c ) , (9) 



This expression can be combined with Eqs. (2) and (3) to derive the result for transverse and 
axial resolution with uniform illumination as follows [16]: 

0.37X 0.37X A 0.37X 

Ax d = -; Ay d = ; Az d = — - (10) 

na cos 6 na na sin 6 

Note that for the dual axes configuration, the axial resolution is proportional to 1/NA , 
where NA = nsin a ~ na , rather than 1/NA , as is the case for the single axis design [3]. 
For example, with uniform illumination and the following parameters: a = 0.21 radians, 
9 = 30 degrees, A, = 0.785 um and n = 1.4 for tissue, Eq. (10) reveals a result for the dual 
axes configuration of Axd = 1.1 um, Aya = 1.0 um, and Aza = 2 urn for the transverse and axial 
resolutions, respectively. Thus, sub-cellular resolution can be achieved in both the 
transverse and axial dimensions with the dual axes configuration using low NA optics but 
not with the single axis architecture. 

For an endoscope-compatible instrument, delivery of the illumination and collection light is 
performed with use of optical fibers and is more appropriately modeled by a Gaussian 
rather than a uniform beam. With this apodization, the detector response for the dual axes 
configuration from a point source reflector in the media, given by Eq. (9), may be solved 
numerically as a function of transverse (xa and yd) and axial (zd) dimensions. The integrals 
are calculated in Matlab, and use the weighting function with 99% transmission. In 
comparison, this model reveals a result of Axd = 2.4 urn, Ayd = 2.1 urn, and Azd = 4.2 u.m for 
the transverse and axial resolutions, respectively. Thus, the use of optical fibers, modeled by 
a Gaussian beam, produces results that are slightly worse but still comparable to that of 
uniform illumination [19]. 

G. Dynamic range 

Differences in the dynamic range between the single and dual axes confocal configurations 
can also be illustrated with this model [18]. The calculated axial response for the single axis 
design with Gaussian illumination is shown by the dashed line in Fig. 6a, where optical 
parameters are used that achieve the same axial resolution (FWHM) of 4.2 p,m. The result 
reveals that the main lobe falls off in the axial (z-axis) direction as 1/z 2 , and reaches a value 
of approximately -25 dB at a distance of 10 jim from the focal plane (z = 0). In addition, a 
number of side lobes can be appreciated. 

In comparison, the response for the dual axes configuration, shown by the solid line in Fig. 
6a, reveals that the main lobe rolls off in the axial (z-axis) direction as exp(-kz 2 ), and reaches 
a value of -60 dB at a distance of 10 |im from the focal plane (z = 0). Thus, off-axis 
illumination and collection of light in the dual axes architecture results in a significant 
improvement in dynamic range and in an exponential rejection of out-of-focus scattered 
light in comparison to that for single axis. This advantage allows for the dual axes 
configuration to collect images with deeper tissue penetration and with a vertical cross- 
section orientation. The transverse response with Gaussian illumination is shown in Fig. 6b. 
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Fig. 6. Dynamic range of novel dual axes confocal architecture, a) The axial response of the 
single axis (dashed line) configuration falls off as 1/z 2 and that for the dual axes (solid line) 
design falls off as exp(-kz 2 ), resulting in a significant improvement in dynamic range, 
allowing for vertical cross-sectional imaging to be performed, b) Transverse (X-Y direction) 
response. 

H. Post-objective scanning 

In confocal microscopes, scanning of the focal volume is necessary to create an image. In the 
single axis architecture, the high NA objectives used limit the working distance, thus the 
scan mirror is by convention placed on the pinhole (fiber) side of the objective, or in the pre- 
objective position, as shown in Fig. 7a. Scanning orients the beam at various angles to the 
optical axis and introduces off-axis aberrations that expand the focal volume. In addition, 
the FOV of pre-objective scanning systems is proportional to the scan angle and the focal 
length of the objective. The diameter of the objective limits the maximum scan angle, and as 
this dimension is reduced for endoscope compatibility, the focal length and FOV are also 
diminished. 
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Fig. 7. a) For pre-objective scanning, illumination light is incident on the objective off -axis, 
resulting in more sensitivity to aberrations and limited FOV. b) With post-objective 
scanning, the incident light is on-axis, less sensitivity to aberrations, and large FOV. Post- 
objective scanning is made possible by the long WD produced by the low NA objectives 
used in the dual axes architecture. 

In the dual axes configuration, the low NA objectives used creates a long working distance 
that allows for the scanner to be placed on the tissue side of the objective, or in the post- 
objective position [17]. This design feature is critical for scaling the size of the instrument 
down to millimeter dimensions for in vivo imaging applications without losing performance. 
As shown in Fig. 7b, the illumination light is always incident on-axis to the objective. In the 
post-objective location, the scan mirror can sweep a diffraction-limited focal volume over an 
arbitrarily large FOV, limited only by the maximum deflection angle of the mirror. 
Moreover, the scanner steers the illumination and collection beams together with the 
intersection of the two beams oriented at a constant angle 8 and with the overlapping focal 
volume moving without changing shape along an arc-line. This property can be 
conceptualized by regarding the dual axes geometry as being equivalent to two separate 
beams produced from two circles in the outer annulus of a high NA lens containing a central 
obstruction (or a large central hole). A flat scan mirror deflects both beams equally, and 
thereby preserves the overlapping region without introducing aberrations to the beams. 

I. Improved rejection of scattering 

In the dual axes conf ocal architecture, the off -axis collection of light significantly reduces the 
deleterious effects of tissue scattering on the dynamic range of detection and allows for 
deeper ballistic photons to be resolved [20]. These features provide the unique capacity to 
collect vertical cross-sectional images in the plane perpendicular to the tissue surface. This is 
the preferred view of pathologists because differences from the normal patterns of tissue 
differentiation are revealed in the direction from the lumen to the sub-mucosa. 

1. Optical configurations 

The improvement in rejection of light scattered by tissue can be illustrated by comparing the 
dynamic range of detection between the single and dual axes optical configurations with 
equivalent axial resolution, as shown in Fig. 8a and 8b. The incident beams are modeled 
with a Gaussian profile because this is representative of light delivered through an optical 
fiber. For the single axis configuration, this beam is focused into the tissue by an ideal lens 
(LI). A mirror (M) is embedded in the tissue at the focal plane (parallel to the x-y plane) of 
the objective lens. In this scheme, the rays that reflect from the mirror pass back through the 
lens LI, deflect at an angle off the beam splitter, and are focused by an ideal lens (L2) on to a 
pinhole detector. For the dual axes set-up, the incident Gaussian beam is focused into the 
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tissue by an ideal lens (L3) with its axis oriented at an angle = 30° to the z-axis, and an 
ideal lens (L4) focuses the backscattered beam, with its axis z' at an angle -30° to the z-axis, 
onto the pinhole detector. As before, a mirror (M) with its plane perpendicular to the z-axis 
and passing through the coincident focuses of the lenses is embedded in the tissue to reflect 
the incident light to the detector. In both configurations, the lens system has a magnification 
of 1 from the focal plane to the pinhole detector. 



beam « 

incident s P l! « e >^2. M 
beam __ 

tissue 




detector 




detector 



Fig. 8. a) single axis and b) dual axes optical configurations are used to evaluate the axial 
response at the detector. 

In order to achieve an equivalent -3 dB axial resolution (FWHM), the NA's for the single and 
dual axes configurations are defined to be 0.58 and 0.21, respectively. From diffraction 
theory, discussed above, the theoretical transverse and axial resolutions for the PSF for dual 
axes at a wavelength A, = 633 nm with an average tissue refractive index of 1.4 and NA = 0.21 
are found to be Ax = 1.16 |im, Ay = 1.00 urn, and Az = 2.00 u.m.3 The mirror is placed at a 
distance of 200 (im below the tissue surface in the focal plane of the objective lenses for both 
the single and dual axes configurations. This depth is representative of the imaging distance 
of interest in the epithelium of hollow organs. The calculations performed to analyze the 
effects of tissue scattering on light are based on Monte Carlo simulations using a non- 
sequential ray tracing program (ASAP® 2006 Breault Research Organization, Tucson, AZ). 
Three assumptions are made in this simulation study: 1) multiple scattering of an incoherent 
beam dominates over diffraction effects, 2) the non-scattering optical medium surrounding 
the lenses and the tissue (the scattering medium) is index matched to eliminate aberrations, 
and 3) absorption is not included to simplify this model and because there is much larger 
attenuation due to the scattering of ballistic photons. 

2. Mie scattering analysis 

We use Mie theory with the Henyey-Greenstein phase function p(8) to model the angular 
dependence of tissue scattering, as follows [21, 22]: 



m- 



i-i 



4/r(l + g -2gcos6>) 3 
where g, the anisotropy factor, is defined as 



(11) 
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Fig. 9. Distributions of photon flux in tissue scattering model. The peak value of multiple 
scattered photons for A) single axis is co-located with the confocal pinhole while that for B) 
dual axes is separated by -50 jxm. As a consequence, the ballistic photons for dual axes 
result in a greater signal-to-noise ratio. 

Given the average scatterer size, refractive index, and concentration, the attenuation 
coefficient p, s and anisotropy g are determined and provided to the ASAP program as 
simulation parameters. For a tissue phantom composed of polystyrene spheres with a 
diameter of 0.48 (im, refractive index 1.59, and a concentration of 0.0394 spheres/ urn 3 in 
water, the values g = 0.81 and (i s = 5.0 mm -1 at A = 633 nm are calculated from Mie theory [23]. 
For single axis, P(y') is defined as the photon flux distribution along the y'-axis at the 
detector. The photon flux can be normalized by defining P*(y') = P(y')/Pmax, where Pma* is 
the maximum flux. The normalized flux P*(y') consists of ballistic (signal) and multiple 
scattered (noise) photons, as shown in Fig. 9a [24]. The maximum flux for both the signal 
and noise components arrive at center of the detector. A confocal pinhole placed in front of 
the detector can filter out some but not all of this "noise," resulting in a reduced signal-to- 
noise ratio (SNR). For dual axes, the detector is angled off the optical axis by 30 deg. P(x') is 
defined as the photon flux distribution along the x'-axis at the detector. The photon flux can 
be normalized by defining P*(x') = P(x')/P max , where P max is the maximum flux at the 
detector. Fig. 9b shows that normalized photon flux distribution for dual axes also exhibits a 
ballistic and multiple scattered components. However, for dual axes, the peak flux of 
multiple scattered photons arrives ~50 (im lateral to the center of the detector where the 
ballistic photons arrive, a consequence of off-axis collection. Thus, there is much less "noise" 
for the confocal pinhole (diameter ~1 u,m) to filter out, resulting in a higher SNR and 
dynamic range. 
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3. Improvement in dynamic range 

An implication of this result is that the dual axes configuration has improved dynamic range 
compared to that of single axis. This difference can be quantified by determining the axial 
response at the detector. This can be done by calculating the photon flux f(Az) as the mirror 
is displaced along the z-axis in the tissue. The flux is calculated using Monte-Carlo 
simulations in ASAP with the mirror at positioned in the range -10 um < Az < 10 urn with 
respect to the focal plane at z = 0, which is located at 200 urn below the tissue surface. The 
flux is then normalized according to F(Az) = f(Az)/f(0). The axial response is shown in Fig. 
10a for various pinhole diameters D, including 1, 2 and 3 |im, which correspond to typical 
fiber core dimensions. Note that for each pinhole diameter, the dual axes (DA) configuration 
has significantly better dynamic range than that of single axis (SA). Note that the 
introduction of tissue scattering results in a reduction of the dynamic range compared to 
that found in free space, as shown by Fig. 6a. 




Fig. 10. Axial response for single and dual axes geometries. The dual axes (DA) 
configuration has a much greater dynamic range than that for single axis (SA) given 
different a) pinhole diameters (1, 2, and 3 um) and b) optical lengths L (4.8, 6.4 and 8.0). 

We can also determine the axial response of the detector for various optical lengths in tissue. 
This analysis reveals differences in the dynamic range between the single and dual axes 
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configuration for tissues with various scattering properties. The total optical length L is 
defined as twice the product of the scattering coefficient p, s and the tissue depth t, or L = 
2(x s t. The factor of two originates from the fact that the total path length is twice the tissue 
depth. The axial response is shown in Fig. 10b for various optical lengths L, including 4.8, 
6.4, and 8.0. Note that for each optical length L, the dual axes (DA) configuration has 
significantly better dynamic range than that of single axis (SA). These values of L are typical 
parameters of gastrointestinal epithelium. At X = 633 nm, (i s is about 7 mm- 1 for esophagus 
tissue [24] and about 20 mm -1 for normal colon mucosa [25]. The range of tissue depths 
spanned by L = 4.8 to 8 for esophagus and colon is 340 p.m to 570 (im and 120 jim to 200 p,m, 
respectively. In addition, these results shows that for single axis only minimal changes occur 
in the dynamic range with approximately a factor of 2 difference in optical thickness L, 
while for dual axes significant changes occur over this thickness range. Furthermore, 
scattering does not appear to alter the FWHM of the axial response for either single or dual 
axes over this range of lengths. 

4. Geometric differences produced by off-axis detection 

The superior axial response of the dual axes confocal architecture has a simple geometric 
explanation. When the mirror moves away from the focal plane by ±A, the centroid of the 
beam is steered away from the optical axis by ±2Asin0 from where the center of the pinhole 
is located [20]. Even taking into consideration diffraction and the broadening of the out-of- 
focus beam, the beam intensity decreases exponentially when A > D/2 (for 8 = 30°). But in 
the single axis case, many of the photons scattered near the vicinity of the focal plane (±A) 
are collected by the detector through the pinhole. Thus, the spatial filtering effect by a 
pinhole for the single axis configuration is not as effective as that for dual axes. The 
implication of this effect for imaging deep in tissue is evident. In the single axis case, many 
of the multiple scattered photons that arrive from the same direction as that of the ballistic 
photons, starting from the surface to deep within the tissue, are collected by the detector 
despite the presence of a pinhole to filter the out-of-focus light. This explains why in Fig. 9a 
the single axis configuration has a large noise component alongside the ballistic component. 
Thus, the dual axes confocal architecture provides optical sectioning capability that is 
superior to that of the conventional single axis design in terms of SNR and dynamic range, 
and this result can be generalized to a range of relevant pinhole sizes. As a result, the dual 
axes architecture allows for imaging with greater tissue penetration depth, thus is capable of 
providing images in the vertical cross-section with high contrast. The implementation of the 
dual axes confocal configuration to an endoscope compatible instrument for collection of 
both reflectance and fluorescence has significant implications for in vivo imaging by 
providing both functional and structural information deep below the tissue surface. 

4. Tabletop dual axes confocal imaging instruments 

The dual axes confocal architecture was first implemented as a tabletop instrument using 
readily available optical components to demonstrate the proof of concept of off-axis 
illumination and collection with post-objective scanning. In particular, the primary 
advantages of the dual axes configuration including high dynamic range and deep tissue 
penetration are revealed by vertical cross-sectional images with either reflectance or 
fluorescence. The combination of these two imaging modes forms a powerful strategy for 
integrating structural with functional information. 
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The dual axes optical design incorporates a solid immersion lens (SIL) made from a fused- 
silica hemisphere at the interface where the two off-axis beams meet the tissue. This 
refractive element minimizes spherical aberrations that occur when light undergoes a step 
change in refractive index between two media. The curved surface of the SIL provides a 
normal interface for the two beams to cross the air-glass boundary. Fused silica is used 
because its index of refraction of n = 1.45 is closely matched to that of tissue. Note that as the 
beams are scanned away from their neutral positions, they will no longer be incident to the 
surface of the SIL and small aberrations will occur. Another feature of the SIL is that its 
curved surface increases the effective NA of the beams in the tissue by a factor of n, the 
index of refraction, and produces higher resolution and light collection efficiency. On the 
other hand, the SIL acts to reduce the scanning displacement of the beams in the tissue by a 
factor of 1/n so that larger deflections are needed to achieve the desired scan range. 

/. Horizontal cross-sectional imaging instrument 




Fig. 11. Horizontal cross-sectional dual axes images ex vivo, a) Squamous esophageal 
mucosa collected at z = urn with X = 488 ran reveals sub-cellular features, including cell 
nuclei (arrows) and membrane (arrowhead), scale bar 20 urn. b) Normal colonic mucosa at z 
= 150 urn with X = 1.3 u,m illumination reveals circular crypts with colonocytes surround the 
lumen and lamina propria (LP) filling space in between the crypts, scale bar 50 urn. 

Reflectance imaging takes advantage of subtle differences in the refractive indices of tissue 
micro-structures to generate contrast. The backscattered photons can provide plenty of 
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signal to overcome the low NA objectives used for light collection in the dual axes 
configuration. The first reflectance images were collected with a tabletop system that used a 
488 nm semiconductor laser that delivered illumination into a single mode optical fiber that 
was focused by a set of collimating lenses with NA = 0.16 to a spot size with -400 u.W of 
power [16]. These parameters produced a transverse and axial resolution of 1.1 and 2.1 urn, 
respectively. The reflected light was collected by a complementary set of optics. The off-axis 
illumination and collection was performed at 8 = 30° to the main optical axis. Reflectance 
images were collected in horizontal cross-sections of freshly excised specimens of esophagus 
ex vivo. As shown in Fig. 11a, the cell membrane and individual nuclei of squamous (normal) 
esophageal mucosa can be appreciated in the image collected at z = urn, scale bar 20 um. 
Much greater image contrast can be achieved with fluorescence imaging where the use of 
optical reporters, such as GFP, and exogenous probes can reveal over expression of 
molecular targets. The same tabletop dual axes microscope was also used to collect 
fluorescence images with a long pass filter to block the excitation light and photomultiplier 
tube (PMT) for detection [26]. In Fig. lib, a fluorescence image of the cerebellum of a 
transgenic mouse that constituitively expresses GFP under the control of a (3-actin promoter 
at a depth of z = 30 um is shown, scale bar 50 um. Purkinje cell bodies (arrows) can be seen 
as large round structures aligned side by side in a row, separating the granule layer and the 
molecular layer. 

K. Vertical cross-sectional imaging instrument 

1. Reflection imaging mode 

In order to collect vertical cross-sectional images, heterodyning can be used to provide a 
coherence gate that filters out illumination photons that are multiply-scattered and travel 
over longer optical paths within the tissue [17]. This approach is demonstrated with a fiber 
optic Mach-Zehnder interferometer, shown in Fig. 12a. A broadband near-infrared source 
produces light centered at X = 1345 nm with a 3 dB bandwidth of 35 nm and a coherence 
length in tissue of ~50 u,m. A fiber coupler directs -99% of the power to the illumination 
path, which consists of a single mode optical fiber (SMFi) with a collimating (CLj) and 
focusing lens (FLi) with NA = 0.186. The axes of illumination and collection are oriented at 
6 = 30° to the midline. Light reflected from the tissue is collected by the second set of 
focusing (FL2) and collimating (CL2) lenses into another single mode fiber (SMF2). The lens 
and fiber parameters are the same for both the illumination and collection beams. The fiber 
optic coupler directs ~1% of the source into a reference beam which is frequency shifted by 
an acousto-optic modulator at 55 MHz for heterodyne detection. An adjustable optical delay 
is used to increase the signal by matching the optical path length of the reference beam to 
that of the ballistic photons. An adjustable optical delay is used to increase the signal. In 
addition, a polarization controller consisting of two half-wave plates and a single quarter- 
wave plate is used to maximize the signal. The reference and collection beams are combined 
by a 50/50 coupler and the resulting heterodyne signal is detected by a balanced InGaAs 
detector (Di, D2) with a bandwidth of 80 MHz. The resulting electronic signal is then 
processed with a band pass filter (BPF) with a 3 MHz bandwidth centered at 55 MHz, then 
demodulated (DM), digitized by a frame grabber (FG), and displayed (D). 
In this heterodyne detection scheme, the reference beam essentially provides amplification 
of the weak collection beam via coherent optical mixing, and enables the measurement of 
reflected light with a dynamic range larger than 70 dB. Post-objective scanning is performed 
with the scan mirror (SM) placed distal to the objective lenses. Reflectance images were 
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collected from fresh biopsy specimens taken from the squamo-columnar junction of subjects 
with Barrett's esophagus. Specimens with dimensions of ~3 mm were resected with jumbo 
biopsy forceps, and the mucosal surface was oriented normal to the z-axis. Vertical cross- 
sectional images were collected with depth of 1 mm. From Fig. 12b, squamous (normal) 
mucosa is present over the left half of the image with an intact epithelium (EP). The other 
structures of normal esophageal mucosa, including the muscularis mucosa (MM), sub- 
mucosa (SM), and muscularis propria (MP), can also be identified. Columnar mucosa 
consistent with intestinal metaplasia is seen over the right half of the image, and reveals the 
presence of pit epithelium (PE) [17]. These findings correlate with the tissue micro- 
structures seen on histology. 




-' CL 2 x| 




Fig. 12. Vertical cross-sectional dual axes confocal reflectance images ex vivo, a) Schematic 
of optical circuit for heterodyne detection, details discussed in the text, b) Reflectance image 
of squamo-columnar junction in esophagus with vertical depth of 1 mm. Squamous 
(normal) mucosa reveals epithelium (EP) and muscularis mucosa (MM) over left half. 
Columnar (intestinal metaplasia) mucosa shows pit epithelium (PE) over right half. 
Submucosa (SM) and musclaris propria (MP) are seen on both sides. 

2. Fluorescence imaging mode 

Fluorescence detection adds an entirely new dimension to the imaging capabilities of the 
dual axes architecture. Detection in this mode offers an opportunity to achieve much higher 
image contrast compared to that of reflectance and is sensitive to labeled molecular probes 
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that can identify specific tissue and cellular targets. These features provide a method to 
perform functional as well as structural imaging, and allows for the study of a wide variety 
of molecular mechanisms. Although the use of low NA objectives in the dual axes 
configuration reduces the collection efficiency of the optics by a factor of ~NA 2 , this deficit 
can be overcome by the use of bright fluorophores. In order to achieve the deep tissue 
penetration depths possible with the off-axis collection of light, a near-infrared laser at 785 
nm is used as the source and a PMT with a long pass filter to block the excitation light is 
used as the detector [27]. The large dynamic range (>40 dB) of the dual axes confocal 
architecture encountered with collection of vertical cross-sectional images requires 
modulation of the PMT gain to compensate for the rapid decrease in fluorescence signal 
with axial depth due to tissue absorption and scattering. 




800 nm 



Fig. 13. Vertical cross-sectional dual axes confocal fluorescence images ex vivo, a) Squamo- 
columnar junction in esophagus with vertical depth of 500 um. Squamous mucosa present 
over left half. Columnar (intestinal metaplasia) mucosa over right half shows crypts with 
goblet cells, b) Colon. Many goblet cells can be seen in dysplastic crypts from a flat colonic 
adenoma. 

In Fig. 13, vertical cross-sectional fluorescence images of a) esophagus and b) colon collected 
with a tabletop dual axes confocal microscope are shown [27]. These specimens were 
incubated with a near-infrared dye, LI-COR IRDye® 800 CW NHS Ester (LI-COR 
Biosciences, Inc) prior to imaging after being freshly excised during endoscopy. These 
images were collected at 2 frames per second with a transverse and axial and resolution of 2 
and 3 (im, respectively. With use of post-objective scanning, a very large FOV of 800x500 
Um 2 deep was achieved. In Fig. 13a, the specimen was collected from the squamo-columnar 
junction of a patient with Barrett's esophagus. Over the left half of the image, the individual 
squamous cells from normal esophageal mucosa can be seen in the luminal to the basilar 
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direction over a depth of 500 |im. Over the right half of the image, vertically oriented crypts 
with individual mucin-secreting goblet cells associated with intestinal metaplasia can be 
appreciated as brightly stained vacuoles. This diseased condition is associated with greater 
than 100 fold relative risk of developing cancer in the esophagus. In Fig. 13b, the specimen 
was collected from a flat colonic adenoma, and the image reveals vertically oriented dysplastic 
crypts with individual goblet cells. 

Volume rendering can also be performed with the dual axes confocal microscope to 
illustrate three-dimensional (3D) imaging capabilities. These views are important for 
tracking cell movements, observing protein-protein interactions, and monitoring angiogenic 
development. A xenograft mouse model of glioblastoma multiforme has been developed by 
subcutaneously implanting ~10 7 human U87MG glioblastoma cells in the flank of a nude 
mouse. Horizontal cross-sectional fluorescence images were collected with a tabletop 
instrument when the tumors reached ~1 cm in size. The mice were anesthetized for the in 
vivo imaging session, and indocyanine green (ICG) at a concentration of 0.5 mg/ml was 
injected intravenously to produce contrast. A skin flap overlying the tumor was exposed, 
and horizontal cross-sectional images were collected with a FOV of 400 x 500 urn 2 . A 
fluorescence image collected at 50 (im below the tissue surface, shown in Fig. 14a, reveals 
that the glioblastoma has developed a dense, complex network of tortuous vasculature. A 
total of 400 horizontal cross-sectional images acquired at 1 |im increments were used to 




Fig. 14. Dual axes confocal fluorescence images in small animal models, a) A horizontal 
cross-sectional fluorescence image of a human U87MG xenograft glioblastoma tumor 
implanted subcutaneously in the flank of a nude mouse was collected in vivo at 50 urn 
depth using i.v. indocyanine green (ICG), b) A 3D volumetric image is generated from a z- 
stack of 400 sections collected at 1 um intervals. 
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generate the 3D volumetric image, shown in Fig. 14b. Volume rendering was performed 
using Amira™ modeling software. 

5. Mems-scanner based dual axes confocal imaging instruments 

The long working distance created by the low NA objectives in the dual axes architecture 
provides space for a scanning mechanism to be placed in the post-objective position. This 
location is a key feature of the design that allows for scaling down of the optics to millimeter 
dimensions. Moreover, for in vivo imaging, a fast scan rate is needed to overcome motion 
artifacts introduced by organ peristalsis, heart beating, and respiratory activity, typically 
requiring a frame rate of >4 per second. As a result, we have developed a MEMS-scanner 
based miniature and endoscope compatible imaging instruments without loss of 
performance. This strategy is much more complex than other approaches being developed, 
but is well suited to meet the size and speed requirements for in vivo imaging in a compact 
package [28,29]. 

L. MEMS Scanner Structure 

The schematic of the 2-D MEMS scanner is shown in Fig. 15a. It has a gimbal structure, and 
is electrostatically actuated by self-aligned, vertical combdrives to give large deflection. The 
mirror can be actuated with respect to the frame by rotating around the springs that define 
the inner axis. The frame supporting the mirror can be actuated with respect to the substrate 
by rotating around the springs that define the outer axis. Fig. 2b shows the cross-sections of 
various structures of the device, which is made in double-stacked silicon-on-insulator (SOI). 
The two device layers are each 30 um thick with an oxide layer of 0.38 um in between. The 
substrate thickness is 530 um, while the oxide layer between the lower device layer and 
substrate has a thickness of 1 um. The thick device layers increase the tilt range of the mirror 
by deeper comb engagement, and lead to a larger FOV. The mirror, movable combteeth, and 
inner torsional springs are fabricated in the upper device layer. The fixed combteeth, outer 
torsional spring, and frame consist of double-stacked layers. A backside window is located 
below the gimbal structure to release the device and allow large-range motion. Four 
actuation voltages are supplied to the lower layer to actuate both sides of each axis (outer: 
VI and V2, inner: V3 and V4, in Fig. 15a). The upper layer and substrate are both grounded. 
Electrical isolation between the device layers and the substrate is provided by buried oxide 
layers, as seen in Fig. 15b. The double-stacked layers of the outer torsional spring and frame 
deliver actuation voltages and ground to the inner combdrives. 

Alignment of the off-axis illumination and collection are achieved with two mirrors 
connected together by a strut. The size of the mirrors (600 um x 650 um) and the distance 
between them (1.51 mm) enable an off-axis half-angle, 0, of 24.3°. The inner combdrives are 
placed on the connecting beam between the two mirrors and the inner springs are recessed 
into the mirror sides, to allow the die size to be reduced to 3.4 mm x 2.9 mm to fit inside the 
scanhead package. The frame width is designed to be 150 um to prevent stress-induced 
curvature of the gimbal. In order to increase the torque with the same number of combs, the 
moment arm is lengthened by placing the outer combdrives further away from the outer 
torsional spring. 

M. Theoretical Analysis 

The outer spring consists of two silicon device layers (each 30 u,m thick) with an oxide layer 
(0.38 um) in between, and delivers three different voltages to the inner frame. Its dimensions 
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Fig. 15. a) Schematic drawing of the 2-D MEMS scanner, b) Cross-sectional view of various 
structures of the scanner. 

are 60.38 um thick and 350 um long. The inner spring consists of one silicon layer and is 
therefore 30 um thick. The mechanical torque of the torsional spring can be expressed as 



TM = kM, 



(13) 



where fc, is the torsional spring constant, and (/> is the mechanical deflection angle. Both 
torsional springs are rectangular, so k, is given as follows [30]. 
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(14) 



Here G is the shear modulus given by G = E / 2(1 + v) , where E is the Young's modulus, and 
v is Poisson's ratio. The parameters l,w, t represent the length, width, and thickness of the 
spring, respectively. 

When an electrical voltage, V, is applied between the fixed and movable combs, the 
electrical torque, T. , is given as follows. 
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T c (<fi,V) = NV 



2dC un M 



(15) 



Here N is the number of comb pairs, and C llnil is the capacitance of a comb pair. In steady- 
state, the mechanical torque of the spring is balanced by the electrostatic torque of the 
combdrives, and torques expressed by equations (13) and (15) are equal. A hybrid program 
which combines a 2-D finite element method (FEM) with analytical calculation [31] is used 
for generating the simulated DC transfer curves of Fig. 16. 
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Fig. 16. Static optical deflection curve of MEMS scanner. 
N. Device Fabrication 

1 DRIE of coarse patterns (Mask 1) 5 DRIE of tipper device layer 
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2 Upper device layer bonding 



Self-alignment mask patterning 
(Mask 2) after LTO deposition 



6 DRIE of lowef device layer by Mask 2 

and upper device layer by Mask 3 



7 Backside patterning and DRIE 
for release (Mask 4) 



4 Partial etching of LTO (Mask 3) 



Single crystal silicon (SCS) 

Thermal oxide 

Low temperature oxide (LTO) 



Fig. 17. MEMS scanner fabrication process flow. 

The design of this mirror uses a gimbal geometry to perform scanning in the horizontal (X- 
Y) plane, and rotation around an inner and outer axes defined by the location of the 
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respective springs. The overall structure has a barbell shape with two individual mirrors 
that have active surface dimensions of 600x650 p,m 2 . A 1.51 mm long strut connects these 
two mirrors so that the illumination and collection beams preserve the overlapping focal 
volume in the tissue. The fabrication process flow, shown in Fig. 17, starts with a SOI wafer 
composed of a silicon substrate, buried oxide, and silicon lower device layers that are 530, 
0.38, and 30 |im thick, respectively [32]. A deep-reactive-ion-etch (DRIE) of coarse patterns, 
including the combdrives and trenches, is performed on the SOI wafer with Mask 1 (step 1). 
Next, an oxide layer is grown on a plain silicon wafer using a wet oxidation process. This 
wafer is then fusion bonded onto the etched surface of the SOI wafer (step 2). The yield is 
increased by bonding in vacuum, and the bonded plain wafer is ground and polished down 
to 30 p,m thickness, forming the upper device layer. 

The two oxide layers between the silicon layers provide electrical isolation, and act as etch 
stops, allowing for precise thickness control. The front side of the double-stacked SOI wafer 
is patterned and DRIE etched to expose the underlying alignment marks in the lower device 
layer. Then, a low temperature oxide (LTO) layer is deposited on both sides of the wafer. 
The front side layer is patterned by two masks. The first mask (Mask 2) is the self -alignment 
mask (step 3), and is etched into the full thickness of the upper LTO layer. The second mask 
(Mask 3) is mainly for patterning the electrodes for voltage supplied to the lower device 
layer (step 4). It goes through a partial etch leaving a thin layer of LTO. The alignment 
accuracy of each step needs to be > g / 2 , where g is the comb gap. Since most devices have 
6 |im comb gaps, this leads to a required alignment accuracy of better than 3 (im. 
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Fig. 18. SEM of 2D gimbaled MEMS scanner, scale bar 500 urn. 

Good alignment accuracy is important to minimize failures due to electrostatic instability 
during actuation. These three masks eventually define the structures in the upper, lower, 
and double-stacked layers. After the front side patterning is done, the LTO layer on the 
wafer back side is stripped (step 5). The wafer is cleaned and photoresist is deposited on the 
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back side. Then, front side alignment marks are patterned. Next, the upper silicon layer is 
etched with the features of Mask 2 in DRIE. Then, a thin LTO and buried oxide layer is 
anisotropically dry-etched. Finally, the lower and upper silicon layers are etched (DRIE) 
simultaneously with features patterned by Mask 2 and 3, respectively (step 6). 
For backside processing, the wafer is bonded to an oxidized handle wafer with photoresist. 
The back side trenches are patterned with Mask 4 on photoresist (step 7). The back side 
trench should etch through the substrate to release the gimbal structure, so handle wafer 
bonding and thick resist is required for DRIE. Alignment to the front side features are 
accomplished by aligning to the previously etched patterns. After the substrate (530 |im) is 
etched by DRIE, the process wafer is separated from the handle wafer with acetone. After 
wafer cleaning, the exposed oxide layer is directionally dry-etched from the back side. 
Finally, the remaining masking LTO and exposed buried oxide layer is directionally etched 
from the front side. 

O. Device Characterization 

The two-dimensional (2D) MEMS scanner is actuated by electrostatic vertical combdrive 
actuators [33]. Electrostatic actuation in each direction is provided by two sets of vertical 
comb actuators that generate a large force to produce sizable deflection angles. The scanning 
electron micrograph (SEM) of the scanner is shown in Fig. 18. There are 4 actuation voltages 
(Vi, V2, V3, and V4) that power the device. The parameters of the scanner are characterized 
for quality control purposes prior to use in the miniature dual axes confocal microscope. 
First, the flatness of the mirror is measured with an interferometric surface profiler to 
identify micro-mirrors that have a peak-to-valley surface deformation <0.1 p,m. The scanner 
is metalized with 10 ran thick aluminum (reflectivity = 67% at 785 nm wavelength) to 
increase reflectivity. 
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Fig. 19. Frequency response of MEMS mirror shows resonant peaks at 0.5 kHz (outer axis) 
and 2.9 kHz (inner axis) to achieve real time operation. 

The radius of curvature of the mirror is greater than 60 cm with an average surface 
roughness of 7 nm. Static optical deflections of ±1.5 deg at 180 volts and ±4.25 deg at 150 
volts are achieved for the outer and inner axis, respectively. The resonant frequencies are 0.5 
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kHz with ±10 deg optical angle at resonance for the outer axis and 2.9 kHz with ±17 deg 
optical angle at resonance for the inner axis. The frequency response of the device is shown 
in Fig. 19. The parametric resonances can sometimes be observed in the inner axis near 
frequencies of 2fo/n, where n is an integer >1 [34]. This phenomenon is caused by the non- 
linear response of the torsional combdrives, which leads to sub-harmonic oscillations. 

6. Dual axes scanhead 

P. Scanhead design 

The design and integration of the miniature dual axes scanhead is a very challenging part of 
the development of this novel imaging technique because of the small size required for 
compatibility with medical endoscopes. This process requires a package that allows for 
precise mounting of the following optical elements: 1) two fiber-coupled collimators, 2) 2D 
MEMS scanner, 3) parabolic focusing mirror, and 4) hemispherical index-matching solid- 
immersion-lens (SIL) [35]. The basic design of the miniature scanhead is shown in Fig. 20. 
Two collimated beams are focused at an inclination angle to the z-axis by a parabolic 
mirror with a maximum cone half -angle a to an overlapping focal volume below the tissue 
surface after being deflected by the 2D MEMS scanner. The flat side of the SIL is placed 
against the tissue, and the curve surface accommodates the incident beams at normal 
incidence to minimize aberrations. The parabolic mirror is fabricated using a replicated 
molding process that provides a surface profile and smoothness needed for diffraction- 
limited focusing of the two collimated beams. Once the beams are aligned parallel to each 
other, the parabolic mirror then provides a "self-aligning" property that forces the focused 
beams to intersect at a common focal point below the tissue surface. Focusing is performed 
primarily by the parabolic mirror which is a non-refractive optical element with an NA of 
0.12. This feature allows for the optical design to be achromatic. That is, light over a broad 
spectral regime can be focused to the same point below the tissue surface simultaneously, 
allowing for future multi-spectral imaging to be performed. 



colli mn 
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Fig. 20. Miniature dual axes scanhead. Two collimated beams that are focused by a parabolic 
mirror at angle 8 to the z-axis for en face scanning by the 2D MEMS mirror. The solid- 
immersion lens (SIL) minimizes aberrations to the incident beams. 
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Q. Scanhead alignment and packaging 

Alignment of the two beams in this configuration is a key step to maximizing imaging 
performance. This step is accomplished by locating the two fiber-pigtailed collimators in a 
pair of v-grooves that are precision machined into the housing, as shown in Fig. 21a [35]. An 
accuracy of 0.05 deg can be achieved in aligning the two beams parallel to one another using 
the v-grooves with pre-assembled fiber collimators. Additional precision in alignment can 
be attained with use of Risley prisms (optical wedges) introduced into the light paths to 
provide fine steering of the collimated beams to bring the system into final alignment. These 
prisms are angled at 0.1 deg, and can be rotated to steer the collimated beam in an arbitrary 
direction over a maximum range of -0.05 deg. This feature maximizes the overlap of the two 
beams after they are focused by the parabolic mirror. Two wedges are used in each beam so 
that complete cancellation of the deflection by each can be achieved, if needed, to provide 
maximum flexibility. 
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Fig. 21. Alignment and assembly of dual axes scanhead. a) Precision machined v-grooves 
and Risley prisms provide coarse and fine alignment, respectively, of the two beams, b) 
Axial (z-axis) displacement of the MEMS chip is made with a slider mechanism. 

Axial (z-axis) displacement of the MEMS chip is performed with a computer-controlled 
piezoelectric actuator (Physik Instrumente GmbH, P-783.ZL actuator, and E-662.LR 
controller) that moves a slider along 3 mechanical supports, shown in Fig. 21b. This feature 
adjusts the imaging depths to collect a stack of en face images to produce the 3D volume 
rendered images. The distal end of the slider has a mounting surface to attach the printed 
circuit board (PCB), which supports the MEMS chip, wire bonding surfaces, and soldering 
terminals. 

A mixture of conductive epoxy (adhesive resin ECCOBOND Solder 56 C and Catalyst 9, 
Emerson & Cuming, Inc.) is used to attach aluminum-1% silicon bonding wire 
(Semiconductor Packaging Materials, Inc.) from the bond pads on the MEMS die to that on 
the PCB. Electrical power is delivered to the mirror via wires that run through the middle of 
the housing, and are soldered onto the PCB terminals. The z-axis translational stage consists 
of a closed-loop piezoelectric linear actuator. Finally, the scanhead assembly is covered and 
sealed from the environment using UV-curing glue to prevent inward leakage of bodily 
fluids. 
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Fig. 22. a) Assembly of the dual axes confocal scanhead mounted on a V-block. b) Gimbaled 
2D MEMS scanner wire bonded onto the PCB, scale bar 2 mm. 

Packaging of the 10 mm diameter scanhead mounted on a V-block stage is shown in Fig. 
22a. A piezoelectric (micro) u-motor is used to perform vertical depth translation (z-axis). 
The MEMS scanner (die size is 3.2 (w) x 2.9 (h) mm 2 ) mounted on the PCB is shown in Fig. 
22b. 

R. Instrument control and data acquisition 

Both the data acquisition and MEMS actuation systems are controlled using Lab VIEW™ 
with Vision Acquisition software package and two National Instruments data acquisition 
(DAQ) boards (PXI-6711 and PXI-6115). The frequency and amplitude of the actuation 
signals control the frame rate and FOV of the MEMS scanner. There are 4 live and 1 ground 
wires that provide voltage to the device and are connected to the wirebond pads on the PCB 
via an ultrasonic wedge bonding technique. 

For each 2D en face image, the MEMS scanner is resonantly driven 180 deg out of phase to 
maximize the linear region of the angular deflection [14, 29] around the outer axis (Vi and 
V2) at 1.22 kHz with a unipolar sine wave at a maximum of 70V, while rotation around the 
MEMS scanner inner axis (V3 and V4) is driven 180 deg out of phase in the DC mode (5 Hz) 
with a unipolar sawtooth waveform at a maximum of 200 V (AgilOptics, Inc). The unipolar 
sawtooth waveform is smoothed at the transition edges to mitigate higher frequency ringing 
from the inner axis. The step size and depth scan range of the piezoelectric actuator (vertical 
translation) can be adjusted to optimize the acquisition of the 3D datasets. 
The PMT gain is synchronously adjusted to compensate for reduced light collected at 
increased tissue depths. Automated frame averaging and display can be performed to 
reduce noise and improve image quality during imaging. 2D en face images from the analog 
input channel are acquired and displayed in real time to enable continuous monitoring or 
visualization of the sample. All images are acquired in 16-bit data format. 3D volumetic data 
can be rendered by post-processing using Amira* software (Visage Imaging, Inc). 



7. Prototype systems 

S. Handheld instrument 

We first developed a 10 mm diameter handheld instrument, schematic shown in Fig. 23a 
[14, 29]. A semiconductor laser delivers 25 mW of light (A = 785 nm) into a single mode fiber 
(SMF, Fibercore Limited, SM750). The fiber terminates in a 1.8 mm diameter gradient index 
(GRIN) collimator (GRINTECH, GmbH). The output beam diameter (1/e 2 ) from the 
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collimator is 0.9 mm. The half angle 9 between the input (illumination) and output 
(collection) beams is 24.3°. The input beam is focused by an aluminum coated parabolic 
mirror (PM) with the focal length of 4.6 mm (Anteryon BV) and reflects off the first mirror 
surface of the MEMS scanner. The center-to-center distance between the two collimators is 
3.7 mm. The focused beam continues through a fused silica SIL (hemispheric lens) until it 
reaches the focal plane below the tissue surface. The SIL has a refractive index (1.47) that is 
similar to that of the tissue and this material was chosen for index matching. The beams 
enter the air-silica interface at normal incidence to minimize aberrations as the focal point is 
scanned. Scattered light from the overlapping focal volume is collected through the optical 
window provided by the SIL and reflected off the MEMS mirror to the opposite surface of 
the parabolic mirror. The collected light is then focused onto the output fiber collimator for 
delivery to the PMT. 

parabolic MEMS a 

mirror scanner 




785 rim 
laser light source 



hemi. lens 



790 nm PMT 




endoscope head 



LP. 

filter 





imaging head 
PMT ft laser 

HV amplifier 
DAQ computer 
actuator •**' 



Fig. 23. Handheld prototype system, a) Schematic of complete instrument, b) Portable 
system demonstrated, c) Packaged handheld (10 mm diameter) dual axes confocal 
microscope with piezoelectric actuator in the handle, scale bar 10 mm. 

As the MEMS mirror raster scans the overlapping beams, the 2D en face image is 
continuously displayed on a computer monitor using a frame grabber and image acquisition 
software. Intensity mapping of each 2D en face image is performed by reading the MEMS 
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scanner driving voltages and estimating focal beam trajectory. A 3D volumetric stack is 
created with post-processing and rendering a series of 2D en face images. Each 3D 
volumetric image is obtained by translating the MEMS scanner with the piezoelectric micro- 
motor in the z-direction under computer-control. Imaging can be performed in reflectance or 
fluorescence mode by inserting a 790 nm long pass optical filter (LP-02-785RU-25, Semrock, 
Inc.) in the collection path for the latter case. The maximum output laser power on the 
sample is 2 mW. A photograph of a fully-packaged miniature dual axes confocal microscope 
is shown in Fig. 23c. 

T. Endoscope-compatible instrument 

We scale down the basic design of the 10 mm diameter handheld instrument to develop the 
5.5 mm endoscope-compatible version, shown in Fig. 24 [36]. This prototype uses the same 
replicated parabolic focusing and MEMS mirrors as that employed in the larger prototype. 
A pair of smaller (1 mm) diameter fiber-coupled GRIN collimator lenses is used in the 
smaller version. Alignment is provided by a pair of 1 mm diameter rotating wedges (Risley 
prisms), which are inserted into the path of one of the collimated beams. The collimators 
and Risley prisms are both located by precision wire-EDM machined v-grooves and epoxied 
into place with UV curing glue. As with the larger prototype, the combined precision of the 
v-grooves and the pointing accuracy of the pre-assembled fiber collimators allow for the 
collimated beams to become parallel to each other to within -0.05 deg accuracy. The 
alignment wedges have a small (0.1 deg) angle, which allows for steering of a collimated 
beam over a range of about 0.05 deg in any direction as each wedge is rotated. 




Fig. 24. Endoscope-compatible dual axes confocal microscope, a) Microscope passes through 
the instrument channel of an Olympus XTQ 160 therapeutic upper endoscope that has a 6 
mm diameter instrument channel, b) Distal end of endoscope shows the protruding dual 
axes microendoscope. 
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This smaller package design also accommodates a slider mechanism, which is used for axial 
(z-axis) scanning of the MEMS chip to provide a variable imaging depths within the tissue 
and for generating 3D volumetric images. This smaller slider mechanism comprises a single 
rod, which moves within a precision hole drilled through the housing. The MEMS chip is 
mounted by an adhesive to a PCB, which is in-turn mounted onto the slider. The PCB 
provides bondpads to accommodate wire bonding to the MEMS chip and also to provide 
soldering terminals for the external control wires that power the scan mirror. In Fig. 24a, the 
endoscope-compatible dual axes confocal microscope is shown inserted through the 6 mm 
diameter instrument channel of a therapeutic upper endoscope (Olympus GIF XTQ160). A 
magnified view of the distal tip is shown in Fig. 24b. 

8. Imaging results 

L7. Reflectance imaging 

Instrument characterization is performed in reflectance mode by imaging a chrome surface 
of a standard (USAF) resolution target. It is also used as a sample to measure the image 
resolution and FOV. The transverse resolution was measured by the knife-edge method, 
defined by 10% to 90% of maximum intensity points, and found to be 5 (im [36]. The axial 
resolution, defined by FWHM, is measured by translating a plane mirror in the z-direction 
and was found to be 7 (im. 




Fig. 25. Reflectance image of standard (USAF) resolution target collected with handheld 
dual axes confocal microscope, scale bar 20 |im. 

Fig. 25 shows a reflectance image collected with the handheld confocal microscope that 
reveals clear visualization of group 7 of the USAF resolution target. The measured values 
are slightly larger than the theoretical resolutions of 4.5 um for the transverse dimensions, 
and 6.0 um for the axial dimension. This is mainly due to the decrease in effective NA of the 
imaging system from the truncation of both input and output collimated beams by the 
width dimension of the MEMS scanner die. All acquired images are captured at 5 
frames/second with the largest FOV of 800x450 p,m 2 (900x506 pixels 2 ). This FOV is much 
larger than that most other miniature confocal instruments, and is achieved with use of 
post-objective scanning. 

V. Ex vivo fluorescence imaging 

1. Handheld dual axes confocal instrument 

The 3D fluorescence imaging capability of the handheld dual axes confocal instrument is 

shown in Fig. 26. Excised tissue specimens of normal and dysplastic colonic mucosa are 
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soaked in 0.5 mg of LI-COR IRDye® 800 CW NHS Ester (LI-COR Biosciences, Inc) diluted in 
10 ml of phosphate-buffered saline (PBS) at neutral pH for 5 minutes and then rinsed with 
water to remove excess dye. After imaging, the specimens are fixed in 10% buffered 
formalin, cut into 5 um sections, and processed for histology with hematoxylin and eosin 
(H&E). All ex vivo images are obtained from freshly excised human tissues (obtained with 
informed consent at the VA Palo Alto Health Care System). 




Fig. 26. Ex vivo images. En face dual axes confocal images of a) normal, d) dysplastic colonic 
mucosa. Corresponding histology (H&E) of b) normal and e) dysplasia. 3D volumetric 
images of c) normal and f) dysplasia, scale bar 100 p,m. 
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Fig 26a, b, and c and Fig 26d, e, and f show the en face image, histology (H&E), and 3D 
volumetric images of normal and adenomatous (dysplastic) colonic mucosa, respectively, 
scale bar 100 |im[36]. Features of colonic crypts, including colonocytes and crypt lumens, 
are clearly resolved. Fig. 26c and 24f show three extracted en face planes at 50, 170, and 230 
um below the tissue surface. The gain is increased with depth to compensate for the lower 
signal levels. 

2. Endoscope-compatible dual axes confocal instrument 

A 2D en face fluorescence image of normal colonic mucosa collected ex vivo with the 5.5 mm 
diameter endoscope-compatible dual axes confocal prototype is shown in Fig. 27, scale bar 
100 um. ICG was topically applied to enhance contrast, and the pseudocolor image shows 
dye enhancement in the lamina propria surrounding the circular shaped crypts. 




Fig. 27. En face fluorescence image of normal colonic mucosa collected with endoscope- 
compatible dual axes confocal prototype ex vivo using topically applied ICG to enhance 
contrast shows regular crypt pattern, scale bar 100 (im. 

W. In vivo fluorescence imaging 

In vivo imaging with the handheld dual axes confocal microscope has also been 
demonstrated. A mouse was anesthetized, and 10 mg of indocyanine green (Sigma- Aldrich, 
Corp) diluted in 10 ml of PBS was injected into the retro-orbital plexus of the mouse. 
Imaging was performed by resting the mouse on a translational stage and placing its ear 
intact on the SIL window of the microscope. Fig. 28a shows an in vivo image of blood vessels 
en face with a maximum intensity projection. Fig. 28b shows a 3D volumetric rendering of 
the image stack obtained by scanning from the surface to 150 um deep into the intact ear. 
The images were collected in 3 um intervals along the z-axis by using the piezoelectric 
actuator. All images were taken at 5 Hz with 5 frames averaging (1 second per image). The 
full 3D volume rendered image was acquired in 50 seconds, scale bar 100 |im. 
In addition, in vivo images of human skin collected with the handheld dual axes confocal 
microscope are shown in Fig. 29. A sequence of approximately 300 individual en face images 
of human skin were collected at a fixed depth of 60 um below the tissue surface (stratum 
corneum) with a speed of 5 Hz p,m [36]. Topically applied indocyanine green was used for 
contrast. Image stitching or mosaicing was performed to enlarge the FOV and to increase 
the signal-to-noise ratio in real time with custom mosaicing software, shown in Fig. 29a. The 
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Fig. 28. a) A maximum intensity projected in vivo image of blood vessels in an intact mouse 
ear collected with handheld prototype, b) A 3D volume rendered image of blood vessels, 
scale bar 100 p,m. 






Fig. 29. a) Image mosaic of human skin acquired in vivo at a depth of 60 um composed of 
roughly 300 images. The white box shows the corresponding location of individual images, 
b) A single input image; c) the corresponding area of the mosaic with improved signal-to- 
noise ratio, scale bars 50 |im. 
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white rectangle box in Fig. 29a represents an individual en face image (100x400 |im 2 ) 
obtained with the dual axes confocal microscope. The images were mosaiced by first 
correcting the image borders for scanning distortions. Then, each new image was registered 
and blended before proceeding to the next. Fig. 29b and 29c demonstrate how image 
mosaicing can increase the signal to noise ratio and dramatically improve image quality by 
tuning the amount of image overlap. The maximum input frame rate that our computer 
acquisition can process with the real-time mosaicing algorithm is 15 frames/second. 

9. Conclusions and future directions 

In this chapter, we present the theory, design and implementation of a novel dual axes 
confocal microscope in both tabletop and miniature form factors. Separate illumination and 
collection of light using the region of overlap between the two beams (focal volume) provide 
a number of advantages for purposes of miniaturization and in vivo imaging. The 
instruments were developed with 785 ran illumination to take advantage of the "optical 
window" in tissue where the high dynamic range and deep tissue penetration of this novel 
architecture can be demonstrated. This instrument is able to achieve sub-cellular resolution 
(~5 |xm), sufficient for in vivo histopathological evaluation. Performance of the dual axes 
confocal microscope is demonstrated by collecting both en face images in real time and 3D 
volumetric images with post-processing at a maximum interrogating depth of 300 jim for 
both ex vivo and in vivo samples. Furthermore, we used this instrument as a test bed to 
further scale down the dimensions of this architecture to a 5.5 mm diameter package for 
endoscope compatibility. The size of the instrument has been reduced with a more compact 
aligning mechanism. 

We have demonstrated a tissue penetration depth with the dual axes confocal microscope 
that is unmatched by any other endoscope-compatible instrument. From our in vivo 
experiments, fluorescence images can be collected up to a depth of 300 um, limited by the 
maximum travel of the piezoelectric actuator. Greater depths have been achieved with our 
tabletop instruments (>500 um). These results demonstrate the large working distance and 
high dynamic range of the dual axes confocal architecture to enable deep subsurface tissue 
imaging. Further improvements in performance can be achieved by increasing light 
throughput. The relatively low output power of 2 mW can be significantly increased with 
use of either silver or gold coatings as the reflective surfaces of the MEMS scanner and 
parabolic mirror, rather than aluminum. In addition, a higher power fiber-coupled laser 
source can be used. 

Future development of dual axes confocal architecture will focus on achieving the 
theoretical levels of performance in a miniature instrument package. In addition, 
repeatability and reliability will be addressed. We will take advantage of the high dynamic 
range of the system by developing new z-axis actuators that rapidly scan the focal volume 
perpendicular to the tissue surface to achieve deep penetration in vertical cross-sections. 
This orientation provides a powerful view for studying the epithelium and presents a 
comprehensive picture of the biological differentiation patterns in this thin layer of 
metabolically active tissue. The epithelium forms the inner lining of all hollow organs, and 
is accessible by medical endoscopes. In addition, we will extend this approach to multi- 
spectral imaging capabilities by developing achromatic optics using the same basic optical 
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design. Finally, smaller form factors will be developed to achieve compatibility with 
standard medical endoscopes. 

As this novel approach matures, we will be able to use this high resolution imaging 
instrument to perform clinical investigation in human subjects and longitudinal studies in 
small animal models. Molecular specificity can be achieved by combining this 
microendoscope with use of affinity probes that bind to over expressed cell surface targets. 
This integrated imaging methodology will provide the ability to visualize molecular 
features of tissue micro-structures in the vertical plane with sub-mucosal axial depths. This 
powerful capability has tremendous potential to unravel previously unknown molecular 
mechanisms about important disease processes, such as cancer and inflammation. 
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1. Introduction 



Recently, a need to develop a high-speed semiconductor device increases remarkably. In 
order to achieve such a device characteristic, we have paid attentions to a device with 
localized stress such as Si-Ge device (Welser et al. 1994, Rim et al. 2000, Tezuka at al. 2001). 
Electron and hole mobility can be influenced by the stress distribution of active layer in very 
fine device. Thus, we have to control the strength of the stress as 2- or 3-dimensional 
distribution for development of advanced devices. It is very important to measure and 
analyze the stress of very fine semiconductor devices. We have expected to analyze the 
stress with Raman spectroscopy. In Raman spectrum, strong stokes and anti-stokes 
scattering peak shifts occur when compressing and stretching the local area of the device. 
On the other hand, a minimum size of semiconductor device pattern will be miniaturized to 
reach to a region of 25 nm in the future. We have to analyze the stress distribution for such a 
structure with very fine probe in Raman spectroscopy. In optical microscopic Raman 
spectroscopy, there exists a diffraction limit so that we cannot obtain the spectrum with a 
resolution of less than submicron. In order to overcome the diffraction limit, scanning near- 
field optical microscopy (SNOM) with a resolution of less than wavelength of optical probe 
by small aperture has been proposed by E. H. Synge in 1932. Then, SNOM technology with 
a few 10s nm resolution has reported by D. Pohl in 1986 (Durig et al., 1986, Fischer et al. 
1989). The technology has been based on near-field optical probe with small aperture on the 
metal-coated probe or fiber probe (Trautman et al. 1994). On the other hand, another 
approach, using apertureless near-field optics, has been proposed by J. Wessel in 1985. The 
concept is based on surface enhanced Raman spectroscopy (SERS). Since 1985, SERS and 
SNOM technologies have been advanced by using scanning tunneling microscopy (STM) or 
atomic force microscopy (AFM) technique. So far, many researchers have reported lots of 
works using various type near field optical microscopes. 

In Raman spectroscopy, the significant technical issuer is how to detect too weak Raman 
scattering signal to increase spatial resolution. For improving the signal, a tip-enhanced 
Raman spectroscopy (TERS) has been proposed by A. Hartschuh, L. Novotny et al. in 2003. 
Then, many researchers proposed various type TERSes; bottom illumination (Stockle et al., 
2000, Hayazawa et al.,2003), side-illumination (Nieman et al., 2001, Hayazawa et al. 2002, 
Mehtani et al. 2005), and modified top illumination (Poborchii et al.. 2005). The concepts are 
based on SERS, in which enhancement of the signal is obtained in the vicinity of the metal 
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particle or tip. It can use the metal particle or metal tip as Raman scattering near-field probe 
with larger amplitude of Raman scattering. Although TERS technology has high spatial 
resolution based on Raman scattering signal enhancement in the vicinity of metal particle, 
the metal particle or metal tip has serious problem that they are one of contamination 
sources in Si device fabrication process. 

In SNOM or NSOM (near-field scanning optical microscopy), many researchers have also 
reported these technologies such as illumination mode, collection mode, illumination- 
collection mode, etc (Pohl, 1986, Trautman et al. 1994). They are based on formation of fine 
electromagnetic field probe in the vicinity of small aperture on the metal tip as near-field 
optical probe. Although the aperture size controls the size of near-field optical probe, 
microscopic image is determined by a signal and noise ratio. Both probe size and optical 
power throughput are very important. So far, good spatial resolution was not demonstrated 
using aperture type SNOM (Hosaka et al., 1996, Ono et al., 2005). 

In near-field Raman spectroscopy, M. Yoshikawa et al. have reported to develop near-field 
optical Raman spectroscopy with illumination-correction mode and fine aperture pyramidal 
probe (Yoshikawa et al., 2006). Using resonant Raman scattering, they have 2-dimensional 
stress distribution of the VLSI standard sample made by Si and SiC>2 for checking AFM The 
resolution is, however, not so high as about 250 nm even though they employed near-field 
light in Raman spectroscopy, though the Raman peak shift image is improved rather than 
optical microscopic Raman spectroscopy. In S. Hosaka group SNOM research, thus 
resolution could be also a little improved with near-field light in a case of using metal 
aperture probe in illumination-correction mode SNOM. The group have already pointed out 
optical aperture probe has limited to improve a spatial resolution because the optical probe 
has two components of near-field and far-field optical probes (Hosaka et al., 1999). The far- 
field optical probe power is gigantic larger than that of near-field probe to eliminate the 
near-field optical probe. Recently, 10 nm-less spatial resolution using surface plasmon effect 
near-field optical probe (Fischer et al., 1989) in an illumination-collection and depolarization 
mode SNOM (Hosaka et al., 2007). 

The metal aperture on outside at the top of the pyramidal probe causes metal contamination 
on the device surface as described above. We have an idea to improve to get fine near-field 
optical probe and to protect the contamination in order to solve these problems. The idea is 
to utilize such a structure that the optical probe is made by plasmon resonance without the 
outside metal and its aperture. The probe is based on the commercial pyramidal probe. The 
outside and inside layers are made of insulator of SisN4 or Si02 and metal, respectively. The 
structure can protect from the contamination. However, we are anxious if we get fine near- 
field optical probe without the aperture. It is very interested to study the aperture-less 
pyramidal probe for high resolution image (Hosaka et al., 2007). 

In this section, I describe recent research states of TERS and SNOM technologies. Then, I 
describe SNOM technology in regard to plasmon resonace optical probe using apertureless 
cantilever, depolarization optical system, Raman spectroscopy, etc. I describe estimation of 
near-field light propagation from the aperture-less pyramidal probe using FDTD method. 
And I describe some features of prototyped atomic force cantilevered SNOM with the 
aperture-less pyramidal probe and its combination system with Raman spectroscopy, and 
discuss on the possibility to detect Raman spectra for measuring fine stress distribution of 
semiconductor devices. 
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2. Raman scattering, TERS and SNOM technologies 

2.1 Raman scattering 

When light incidents the sample, the reflected light is modulated by lattice vibration of the 
sample. Assuming that polarizability of the sample is given by a = flo + 0icos2;ro s f and the 
electric field is given by E = Eo cosliwt the induced dipole moment P is given by P = aE 
where v\ and v are vibrations of the sample and incident light, respectively. The P is given 
byEq.(l). 



P = aE = (a + flj cos27zv s t)E g 27rvt 

1 1 

= a E cos2ftvt -\ — fl 1 E cos2^(u-u s )f h — HjEq cos2^(i; + v s )t 



(1) 



The reflected dipole moment has 3 components as presented in above equation. The 1st, the 
2nd and the 3rd terms correspond to Rayleigh scattering light, stokes scattering light and 
anti-stokes scattering light, respectively, as shown in Fig. 1(a). Therefore, stokes and anti- 
stokes scattering lights have the sample's information of lattice vibration, polarizability, 
atomic binding, stress, etc. 

Furthermore, when compression or tensile stress is applied to the sample, the frequencies of 
stokes and anti-stokes scattering peaks are shifted. The peak-frequency shift (peak shift) 
occurs as shown in Fig. 1(b). The peak shifts to low and high frequencies mean tensile stress 
and compression stress, respectively. The stress a can be estimated from the shift Ad as 
represented by Eq. (2). 



a = 2.3 x 10 4 Av [Nan- 2 ] 



(2) 
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(a) (b) 

Fig. 1. Raman scattering phenomenon (a) and Raman peak shifts due to tensile 
andcompression stress (b). 



2.2 TERS 

The TERS has typically 3 types of bottom illumination, side illumination and modified top 
illumination as shown in Fig. 2. A. Hartschuh et al. have reported that near-field Raman 
spectroscopy with a spatial resolution of 20 nm has been demonstrated using a bottom 



434 



Advances in Solid State Circuits Technologies 



Metal probe Metal probe 

..,.,,.„ Laser beam 




Laser beam 




) [ 



sample 



1 Laser beam 



(a) (b) (c) 

Fig. 2. Schematic diagrams of some TERSes; (a) bottom illumination, (b) side 
illuminationand (c) modified top illumination. 

illumination mode TERS with vibration mode of single-wall carbon nanotubes (SWNTs) 
(Hartschuh et al., 2003). The tip material was gold. The tip was controlled within 10-50 pN 
using tuning fork detection in AFM. They demonstrated an enhancement of TERS signals of 
G and G' bands by a comparison of near-field and far-field as shown in Fig. 3. They showed 
Raman scattering image of SWNT using G' band peak, and topographic and Raman 
scattering signal profiles (Fig. 4). The G' band Raman signal profile indicated that the spatial 
resolution was less than 30 nm because a FWHM of SWNT with G' band peak was about 26 
nm. 
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Fig. 3. Raman spectra detected with a sharp metal tip (green line) on top of the 
sample(distance; 1 nm) and with the tip retracted by 2 }iu\ (black line). Note, the intensities 
of allRaman bands are increased with the tip close to the SWNTs (Hartschuh et al., 2003). 

D. Mehtani et al. have introduced great potential of side illumination mode TERS for 
nanoscale chemical characterization and semiconductor (Mehtani et al., 2005). They 
demonstrated enhancement of Raman scattering signal of various molecular, polymeric and 
semiconducting materials as well as carbon nanotube (CNT) by comparing with far-field 
Raman signal. V. Poborchii et al. have introduced modified top illumination mode TERS 
using a silver particle on the top of quartz AFM cantilever probe immersed into glycerol 
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Fig. 4. (a) Raman image of SWNT bundles acquired by raster scanning a sharp metal tipand 
detecting the intensity of the G' band (scan area 3x1 um, integration time 5 msper pixel), (b 
and c): Cross-sections taken along the white dotted line in (a) fortopography (b) and Raman 
signal (c) (Hartschuh et al., 2003). 

droplet on Si surface (Poborchii et al.. 2005). As the experimental result, spatial resolution in 
a range of 100 nm was demonstrated. The system was used with depolarization optics 
without 364 nm primary light. As described above, TERS technology has tremendous 
potential to enhance Raman scattering signal compared with far-field Raman scattering 
signal, but the best spatial resolution was about 30 nm. It is not enough to apply the 
technology to measure the Si device. 



2.3SNOM 

A research on this technology has been focused into improvement of near-field optical probe 
with very fine probe and high contrast against far-field light. At first, S. Hosaka et al. 
changed the optical fiber probe to the AFM cantilevered pyramidal optical probe because 
the fiber probe was broken by hard contact between the probe and sample. The cantilevered 
optical probe has a possibility to observe nanometer-sized pits formed by electron beam 
(EB) drawing. They succeeded in observing the 30 nm x 160 nm small pits by using the 
polarized near-field light as the illumination-collection mode SNOM. In the experiments, 
when they adopt an optical aperture on the cantilever, they had to focus illuminating laser 
beam into the aperture and to achieve another laser beam deflection optics for atomic force 
detection (optical lever). To achieve the requirements, they had to develop a through the 
lens (TTL) type optical lever. However, they have reported that illumination mode SNOM 
with an aperture on the top of metal probe has 2 components of near-field and far-field 
probes on the vicinity of the aperture (Fig. 5). The near-field probe power was very small 
rather than that of far-field probe. We needed to remove the far-field light and enhance the 
near-field probe power (high throughput) with small diameter. For the former, we adopted 
depolarization optics to detect only reflected near-field light without the far-field light 
reflected from the pyramidal probe. The illumination-collection mode SNOM optics was 
developed as shown in Fig. 6. The system obtained a spatial resolution of 300 nm from near- 
field Kerr effect image of perpendicular magnetic recorded bits (Fig. 7). The resolution was 
not enough to obtain nanometer resolution by using such optics and probe. The low 
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Fig. 5. Estimated optical probe profile (a) and the tip structure and its SEM image 
(b)(Hosakaetal., 1999). 
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Fig. 6. Scheme of the illumination-collection SNOM system (Hosaka et al., 2007) 

resolution was caused by the aperture formed on the top of the metal probe. In Raman 
spectroscopy, M. Yoshikawa et al. have reported that they developed tuning fork AFM 
cantilevered illumination-collection type SNOM for stress distribution in VLSI standard 
sample, which has been used as AFM check sample. The experimental result showed a 
resolution of about 250 nm from peak-frequency shift image of the sample around 520 cm" 1 
Si peak (Fig. 8). These data could not show high spatial resolution of less 50 nm. This might 
be caused by near-field optical probe with an aperture. Furthermore, the probe has a 
problem to make a contamination on Si device. Therefore, we have to solve the optical probe 
with no metal surface probe. We have proposed a plasmon effect near-field optical probe 
with Au inner film on outer pyramidal AFM conventional probe made of SiN with no 
aperture as described in next section. 
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Fig. 7. SNOM observation of 640MB magneto-optical disc(2mmx2mm); (a) AFM imageand 
(b) SNOM image (Kerr effect image), (c) top view of the aperture and (d)pyramidal probe 
(Onoetal., 2005). 




Fig. 8. The near-field Raman scattering Si images of (a) peak-frequency, and (b) stress in VLSI 
standards, measured by the pyramidical probe with a diameter of 100 nm. Theoptical 
microscope images of (c) peak-frequency, and (d) stress in VLSI standards,respectively 
(Yoshikawa et al, 2006), 



3. Illumination-collection mode SNOM with aperture-less pyramidal probe 

In order to consider whether we can obtain fine near-field optical probe from top of the 
aperture-less pyramidal AFM probe inside-coated with a metal film, we have studied light 
propagation through the top of probe when illuminating ultra-violet (UV) laser with a wave 
length of 363.8 nm using finite differential time domain (FDTD) method. 
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Fig. 9. Calculation model for near-field light emission from aperture-less cantileveredprobe 
with metal film using FDTD method. 

Figure 9 shows the AFM cantilever image and scheme of the probe for SNOM probe. The 
cantilever is available for commercial pyramidal one, which is model of OMCL-TR400PSA-1 
made by Olympus Inc. Figure 9(b) shows the calculation model of an enlarged image of the 
top of the pyramidal probe based on Fig. 9(a) when illuminating the UV laser into the probe. 
We executed FDTD calculation with very fine mesh with a size of 6 nm. As a result, we 
obtained near-field light profiles emitted from the top of the pyramidal probe for various 
metal films as shown in Fig. 10. The near-field light power becomes strong in order of 
aluminum, gold and silver. In the cases of Au and Ag, surface plasmon may occur on the 
inside metal film. In addition, the near-field light can be propagated along the pyramidal 
surfaces and emitted from the top. Figure 11 shows the light propagation from the top. The 
light can propagate to shallow region of <200 nm in FWHM of size far from the top. 
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Fig. 10. Calculated images of near-field light emission from the top of aperture-less 
cantilevered probe with various metal films (50 nm) when illuminating an UV light witha 
wavelength of 363.8 nm (Hosaka et al., 2007). 

Figure 12 shows calculated results of the near-field light propagation through the top for 
various Au film thicknesses when illuminating the UV laser. The figure shows that near- 
field light emits from the tip through the metal film and SiN dielectric material even by 
using thick metal film. The power decreases gradually with thickness. The profiles of the 
optical probe are almost same shape. As described above, we can use the near-field optical 
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probe even when using the aperture-less pyramidal probe. The optical profiles, however, 
indicates that the estimated near-field optical probe size is too large to detect a detail of the 
light image or localized Raman signal with a resolution of <100 nm. 
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Fig. 11. Calculated results of near-field light propagation for the distances from thesurface of 
the aperture-less cantilevered probe with various metal film (thickness:50nm), (a) in-plain 
distribution n and (b). near-field light propagation above the sample surface (Hosaka et al., 
2007). 
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Fig. 12. Calculated images of near-field light emission from the top of aperture-lessmetal 
probe for various metal film thicknesses, (a) 60 nm-thick Au film, (b) 30 nm-thickAu film, 
and (c) without the metal film at incident light of UV light with 363.8 nm in 

4. Prototype atomic force cantilevered SNOM system (Ono et al., 2005) 

The prototyped SNOM system has 4 functions; (1) to keep the gap between the probe and 
the sample constant by controlling a Z-position of the sample using atomic force detected 
with the through the lens (TTL) type optical lever 6 ) (AFM function), (2) to generate near-field 
light emitted from the top of the probe by illuminating the laser beam into the inside of the 
pyramidal probe (illumination-collection mode SNOM function), (3) to detect only near- 
field light reflected from the sample surface using a polarization (depolarization mode 
SNOM function) or Raman spectroscopy, and (4) to adjust the laser beams incident on the 
fixed positions of the cantilever using a charge-coupled device (CCD) camera for optical 
lever and near-field light adjustments. In order to achieve these functions, we have 
developed multi-beam optics. These optical axes coincide on the object lens. The main optics 
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is shown in Fig. 6. In SNOM or Kerr effect detection, a He-Ne laser beam with a wavelength 
of 632.8 nm (red laser) and a semiconductor laser beam with 532 ran (green laser) for near- 
field light and atomic force detection of deflection optical lever, respectively, were used. In 
Raman spectroscopy, UV laser with a wavelength of 363.8 nm was used with Ar ion laser. 
The objective lens was used with a numerical aperture (NA) of around 0.5 in both cases. 
The depolarization optics consists of a He-Ne laser source, a half wave (X/2) plate, quarter 
wave (X/4) plates, AFM cantilever with a small aperture, a G-T analyzer, and a 
photomultiplier tube (PMT). The linearly polarized light that emitted from the He-Ne laser 
is converted into circularly polarized light through the X/4 plate, and is focused into the 
inside of the probe on the cantilever tip. After the polarized near-field light is reflected from 
the sample surface through or outside of the top, a plane of polarization is slightly rotated. 
By passing the reflected light through the X/4 plate, the plane of the polarization is 
converted to the linearly polarized light with various angles. The G-T analyzer is adjusted to 
remove the far-field light reflected from inside wall of the probe. On the other hand, near- 
field light reflected from the sample surface has a little shift of polarization so that we can 
detect only polarization-rotated near-field light using G-T analyzer. 

Figure 13 shows AFM and depolarization SNOM images of the sputtered Au film on the 
glass with a comparison of SEM image. The system observed not only many fine clacks in 
the film in AFM image, but also bright small lines with a width of <10 nm in SNOM image. 
The images indicate that both functions of AFM and SNOM have very fine resolution of less 
than 10 nm. 
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Fig. 13. Test sample observation of Au film; (a) SEM image, (b) AFM image and (c) SNOM 
image (Hosaka et al., 2007). 

The optical system observed Kerr effect near field image of optical recording of conventional 
giga magneto-optical (MO) disc as shown in Fig. 14. From the rise-up at the signal edge, a 
resolution of Kerr effect image was less than 20 nm, considering a magnetic domain wall 
between switched bits. Comparing these data with previous image (Fig. 7), the aperture-less 
pyramidal SNOM has the potential to achieve fine spatial resolution in near-field light 
images. 
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Fig. 14. AFM and SNOM images of 2.3 GB-MO recorded bits (MO disc spec: min. bit length 
233nm, track pitch: 670 ran (Hosaka et al., 2007). 



5. Combination of SNOM and Raman spectroscopy (Hosaka et al. 2007) 

We have prototyped the combination system with above system (Fig. 6) and Raman 
spectroscopy of Nanofinder30, which is made by Tokyo Instruments Inc., as shown in Fig. 
15. The stimulated laser was used with UV line with a wavelength of 363.8 nm in Ar ion 
laser. Figure shows a scheme of optical functions of the system. In practice, the spectroscopy 
was inserted between the TTL type optical lever system and the depolarization optics. The 
sample with gates and shallow trench insulator (STI) on Si substrate was prepared for 
measuring its stress distribution. 
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Fig. 15. Scheme of prototyped SNOM Raman spectroscopic microscopy (Hosaka et al., 2007). 
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Figure 16 shows Raman spectra change before and after AFM operation when illuminating 
the UV laser into the inside of aperture-less pyramidal probe. Figure 16(a) shows the 
spectrum under the sample was far from the probe. The Si Raman peak disappeared. Under 
controlling the system in the contact mode AFM, the peak at about 520 cm" 1 was obtained. 
When adjusting the optics and using strong power of UV laser, strong signal was obtained 
as shown in Fig. 16(c). 
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Fig. 16. Raman shift spectra of the silicon sample when illuminating UV light with a 
wavelength of 363.8 nm to inside of the pyramidal probe, (a) in separation between the 
probe and the sample, (b) under contact mode controlling at 20nN, and (c) incident laser 
power of 1.5 mW (Hosaka et al., 2007). 

Figures 17(a) and 17(b) show spectra detected by the near-field optical Raman spectroscopy 
(NFRS) and optical microscopic Raman spectroscopy (OMRS) at an accumulation time of 16 
sec, respectively. The OMRS can be easily achieved by removing the cantilever probe. 
Although the detected intensity of NFRS is weaker than that of OMRS, it is clarified that we 
detected Raman scattering peak of Si using the system and the aperture-less pyramidal 
probe. 
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Fig. 17. Comparison of spectra in near-field optical Raman spectroscopy (NFRS) (a) 
andoptical microscopy Raman spectroscopy (OMRS) (Hosaka et al., 2007). 
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6. Raman spectroscopy of Si device test sample (Hosaka et al.) 

The test sample structure and anticipated Raman peak shift of Si are shown in Fig. 18. The 
compression and tensile stresses distribution can be estimated in the sample. In the gate, 
compression stress occurs because of oxidation of both gate sides. In the Si area between the 
STI and the gate, tensile stress ocurrs because STI-Si02 volume is shrunk by annealing. 
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Anticipated Raman peak shift 

Fig. 18. A structure of the test sample, and anticipated stress model and Raman peak shift. 

Using the sample, the surface structure was observed by AFM function in prototyped 
SNOM and scanning electron microscope (SEM). We can observe small dimension of the 
gate of about 25 nm as shown in Fig. 19. Then, we measured Raman spectrum around 520 
cm" 1 at each pixel on one line of the sample for peak-frequency shift and stress distribution. 

SEM image AFM image 




Fig. 19. SEM and AFM images of the sample, AFM image was taken by SNOM system, and 
stress model. 

After the measurement, the peak-frequency shift was processed by Lorenz peak fitting of 
the spectrum for fine spatial resolution of less than 0.1 cm" 1 . The peak-frequency shift profile 
is shown in Fig. 20. The profile is well agreed with the anticipated profile. The peak of 
compress stress appears at the gate part. The FWHM of the peak is less than 25 nm. The 
result was obtained when the focused laser beam with a power of 1.5 mW was illuminated 
into the pyramidal probe of AFM. The Au film was coated inside the probe with a thickness 
of 50 nm. The highest compression stress of 1-1. 5x1 4 [Ncm" 2 ] was estimated, and is 
consistent with the result reported by M. Yoshikawa st al. Therefore, the technology has the 
potential to get near-field Raman scattering profile with fine spatial resolution of less than 
25 nm. 
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Fig. 20. AFM image of the gate and Raman spectrum peak shift around the gate. 



7. Summary 

Recent technologies of tip-enhanced Raman spectroscopy (TERS) and scanning near-field 
optical microscopy (SNOM) with Raman spectroscopy are reviewed. 

In TERS technology, it has been developed based on surface-enhanced Raman spectroscopy 
(SERS). The some TERS are reviewed as follows: 

1. Bottom illumination mode TERS has been described, and it has demonstrated fine 
spatial resolution and gigantic enhancement of Raman scattering signal using single 
wall carbon nano tube (SWNT). 

2. Side- and modified top-illumination mode TERS have been described, and they have 
demonstrated enhancement of Raman scattering signal and resolution of 
subwavelength. 

3. There are, however, some technical issues such as metal contamination, stress 
measurement of Si device, etc. This means that it is difficult to apply to an evaluation of 
semiconductor devices, etc. 

In SNOM technologies, illumination-collection mode SNOM has been described with regard 
of aperture type SNOM probe and aperture-less pyramidal SNOM probe. 

4. Using the aperture type SNOM probe, it is difficult to obtain fine spatial resolution 
because the aperture makes only near-field optical probe incompletely. 

5. M. Yoshikawa et al. demonstrated spatial resolution of about 200 nm in peak-frequency 
shift image with Raman Si peak of VLSI standard sample using illumination-collection 
mode with the aperture type SNOM probe. 

6. S. Hosaka et al. have proposed the aperture-less pyramidal probe in illumination- 
collection mode SNOM to improve near-field optical probe and to protect from the 
metal contamination to the device surface. The SNOM and Raman spectroscopy was 
combined with UV laser. The prototyped microscopy demonstrated following 
possibilities. 

a. By calculating near-field light propagations in the aperture-less pyramidal probe by 
FDTD method, strong near-field light propagation from top of the aperture-less 
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pyramidal probe occurs due to surface plasmon effect using Au and Ag as inside 
metal film. 

b. Very fine SNOM image of the crack network of less than 10 nm in thin Au film 
using the aperture-less pyramidal probe SNOM was obtained with a spatial 
resolution of less than 10 nm. 

c. The combination system of SNOM and Raman spectroscopy has the possibility to 
detect Raman scattering light using the aperture-less pyramidal probe. 

d. The system measured the Si sample with STI and gate structure to get both AFM 
image and Si peak-frequency shift in stokes scattering. 

e. The system observed compression and tensile stress of 1-I.5xl0 4 [Ncnv 2 ] on the 
sample. 

f . The spatial resolution of less than 25 nm was demonstrated in Raman spectroscopy. 
As described above, scanning near-field Raman spectroscopic microscopy has the potential 
to measure a detail of the sample such as structure, magnetization, optical property, 
chemical characterization, stress distribution, etc with a fine spatial resolution of 25 nm. In 
the future, the technology is expected to be one of key technologies for evaluation of 
materials. 
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